Junhao Chen1, Mingjin Chen2, Jianjin Xu3, Xiang Li4, Junting Dong5†
Mingze Sun1, Puhua Jiang1, Hongxiang Li4, Yuhang Yang6
Hao Zhao1, Xiaoxiao Long7, Ruqi Huang1†
1Tsinghua University 2Beijing Normal–Hong Kong Baptist University 3Carnegie Mellon University
4Peking University 5Shanghai AI Laboratory 6University of Science & Technology of China 7Nanjing University
DanceTog generates video with interactive details from a single reference image and independent pose and mask sequences, while maintaining consistent identities even during frequent position exchanges between multiple individuals.
Given the condition estimated from the same driving video, DanceTog can transfer it to other reference images and generate complex interactive motions and character details (skirt, hair).
@article{chen2025dancetogether,
title={DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation},
author={Junhao Chen and Mingjin Chen and Jianjin Xu and Xiang Li and Junting Dong and Mingze Sun and Puhua Jiang and Hongxiang Li and Yuhang Yang and Hao Zhao and Xiaoxiao Long and Ruqi Huang},
year={2025},
eprint={2505.18078},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.18078},
}