📝 Publications

A full publication list is available on [Google Scholar] [Semantic Scholar]

(*: equal contribution; †: corresponding authors.)

Video Generation
sym

A Survey on Video Diffusion Models
Zhen Xing , Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang
ACM Computing Survey (CSUR, IF=23.8), 2024
[Paper][HomePage][Zhihu][机器之心][量子位]
Surveying 300+ recent literatures on video generation and editing with diffusion models. Acheving Github 2000+ stars.

Video Generation
sym

SimDA: A Simple Diffusion Adapter for Efficient Video Generation
Zhen Xing, Qi Dai, Han Hu, Zuxuan Wu, Yu-Gang Jiang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[Paper][HomePage]

Video Generation
sym

StableAnimator: High-Quality Identity-Preserving Human Image Animation
Shuyuan Tu, Zhen Xing, Xintong Han, Zhi-Qi Cheng, Qi Dai, Chong Luo, Zuxuan Wu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
[Paper][Code][Homepage][机器之心]
Acheving Github 1200+ stars.

Video Generation
sym

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
Quanhao Li*, Zhen Xing*, Rui Wang, Hui Zhang, Zuxuan Wu
Technical Report, 2025
[Paper][Code][HomePage][量子位]

Video Generation
sym

GenRec: Unifying Video Generation and Recognition with Diffusion Models
Zejia Weng, Xitong Yang, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang
Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
[Paper]

Video Generation
sym

AID: Adapting Image2Video Diffusion Models for Instruction-based Video Prediction
Zhen Xing, Qi Dai, Zejia Weng, Zuxuan Wu, Yu-Gang Jiang
Technical Report, 2024
[Paper][HomePage]

Video Editing
sym

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
Zhen Xing, Qi Dai, Zihao Zhang, Hui Zhang, Han Hu, Zuxuan Wu, Yu-Gang Jiang
Technical Report, 2024
[Paper][HomePage][Zhihu]

Video Understanding
sym

SVFormer: Semi-supervised Video Transformer for Action Recognition
Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[Paper][Code]

3D Understanding
sym

PanoSwin: a Pano-style Swin Transformer for Panorama Understanding
Zhixin Ling, Zhen Xing, Manliang Cao, Xiangdong Zhou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[Paper][Code]

3D Generation
sym

Semi-supervised Single-view 3D Reconstruction via Prototype Shape Priors
Zhen Xing, Hengduo Li, Zuxuan Wu, Yu-Gang Jiang
European Conference on Computer Vision (ECCV), 2022
[Paper][Code]

3D Generation
sym

Few-shot Single-view 3D Reconstruction with Memory Prior Contrastive Network
Zhen Xing, Yijiang Chen, Zhixin Ling, Xiangdong Zhou, Yu Xiang
European Conference on Computer Vision (ECCV), 2022
[Paper][Code]

Image Retrieval
sym

Conditional Stroke Recovery for Fine-Grained Sketch-Based Image Retrieval
Zhixin Ling, Zhen Xing,Jian Zhou, Xiangdong Zhou
European Conference on Computer Vision (ECCV), 2022
[Paper][Code]

  • ​AdaDiff: Adaptive Step Selection for Fast Diffusion​
    Hui Zhang, Zuxuan Wu, ​​Zhen Xing​​, Jie Shao, Yu-Gang Jiang
    ​AAAI​​, 2025, [Paper]

  • ​Advancing Dark Action Recognition via Modality Fusion and Dark-to-Light Diffusion Model​
    Yuxuan Wang, ​​Zhen Xing​​, Zuxuan Wu
    ​ICASSP​​, 2025, [Paper]

  • ​Human2Robot: Learning Robot Actions from Paired Human-Robot Videos​
    Sicheng Xie, Haidong Cao, Zejia Weng, ​​Zhen Xing​​, Shiwei Shen, Jiaqi Leng, Xipeng Qiu, Yanwei Fu, Zuxuan Wu, Yu-Gang Jiang
    ​Arxiv​​, 2025, [Paper]

  • ​Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms​
    Miaosen Zhang, Yixuan Wei, ​​Zhen Xing​​, Yifei Ma, Zuxuan Wu, Ji Li, Zheng Zhang, Qi Dai, Chong Luo, Xin Geng, Baining Guo
    ​NeurIPS​​, 2024, [Paper], [HomePage]

  • ​FDGaussian: Fast Gaussian Splatting via Geometric-aware Diffusion Model​
    Qijun Feng, ​​Zhen Xing​​, Zuxuan Wu, Yu-Gang Jiang
    ​Arxiv​​, 2024, [Paper], [HomePage]

  • ​TranSFormer: Slow-Fast Transformer for Machine Translation​
    Bei Li, Yi Jing, Xu Tan, ​​Zhen Xing​​, Tong Xiao, Jingbo Zhu
    ​ACL (Findings)​​, 2023, [Paper]

  • ​Multi-Level Region Matching for Fine-Grained Sketch-Based Image Retrieval​
    Zhixin Ling, ​​Zhen Xing​​, Jiangtong Li, Li Niu
    ​ACM MM​​, 2022, [Paper]

  • ​3D-Augmented Contrastive Knowledge Distillation for Image-based Object Pose Estimation​
    Zhidan Liu, ​​Zhen Xing​​, Xiangdong Zhou, Yijiang Chen, Guichun Zhou
    ​ICMR​​, 2022, [Paper]

  • ​CaSS: A Channel-aware Self-supervised Representation Learning Framework for Multivariate Time Series Classification​
    Yijiang Chen, Xiangdong Zhou, ​​Zhen Xing​​, Zhidan Liu, Minyang Xu
    ​DASFFA​​, 2022, [Paper]

  • ​From Coarse to Fine: Hierarchical Structure-aware Video Summarization​
    Wenxu Li, Gang Pan, Chen Wang, ​​Zhen Xing​​, Zhenjun Han
    ​TOMM​​, 2022, [Paper]