Haiyang Liu

haiyangliu1997gmail.com

Hi! I'm currently a final-year PhD student in Information Science and Technology, The University of Tokyo. I'm working on Human Video Generation and Motion Generation, using multi-modal conditions such as speech, text scripts, keypoints and image. My works are priminaly focused on the body, but also include the face. I received my M.E. from Waseda University, in 2020.9, and B.E. from Southeast University, in 2019.9.

During my PhD, I have intern at (time order) Hedra Research, Adobe Research, CyberAgent AI Lab for human video generation, and Huawei Research Tokyo for motion generation.

I'm interested in impact-driven research problems on human video-gen and motion-gen, and have great respect for simple yet effective ideas, such as Associative Embedding, ZeroConv, and Phase AutoEncoder. I'm grateful we stand on the shoulders of giants, such as pretrained models, and hope to contribute back to the community.

(New) Seeking full-time positions starting in 2025.

Research interests

  • Multi-Modal Understanding for Human Motion
  • Multi-Modal Generation for Human Motion
  • Full-Body Human Video Generation
  • Streaming for Video & Motion Generation

Selected Publications

Video Motion Graphs

Haiyang Liu Zhan Xu Fa-Ting Hong HSin-Ping Huang Yi Zhou Yang Zhou

ArXiv Preprint 2503.20218, ArXiv, 2025

TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

Haiyang Liu Xingchao Yang Tomoya Akiyama Yuantian Huang Qiaoge Li Shigeru Kuriyama Takafumi Taketomi

International Conference on Learning Representation, ICLR (Oral), 2025

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Mask Audio Gesture Modeling

Haiyang Liu* Zihao Zhu* Giorgio Becherini Yichen Peng Mingyang Su You Zhou Xuefei Zhe Naoya Iwamoto Bo Zheng Michael J. Black

Computer Vision and Pattern Recognition, CVPR, 2024

BEAT: A Large Scale Semantic and Emotional Multi Modal Dataset for Conversational Gesture Synthesis

Haiyang Liu Zihao Zhu Naoya Iwamoto Yichen Peng Zhengqing Li You Zhou Elif Bozkurt Bo Zheng

European Conference on Computer Vision, ECCV, 2022

DisCo: Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gesture Synthesis

Haiyang Liu Naoya Iwamoto Zihao Zhu Zhengqing Li You Zhou Elif Bozkurt Bo Zheng

ACM Multimedia, ACMMM, 2022

Colaborated or Advised Works

  • Video-Gen: FVHuman
  • Motion-Gen: GestureLSM, SyncTalker, FastTalker, HQGesture, EGGesture

Intern Experience

Academic Services

  • Reviewer: SIGGRAPH, SIGGRAPH Asia, Eurographics, ICLR, CVPR, ICCV, ECCV, ACMMM
  • Talk: TechBeat (Beijing, 2022); Virtual Computing (Kyoto, 2022), HuaWei (Tokyo, 2024)

Thanks

  • “Take it easy with our papers, you should prioritize things that will impact you over the next few years.” from Yang Zhou.
    - Learning from the best mentor who consider the feeling and future of intern and friends.
  • “I’m perfectly fine either way — you know best. And if postpone, I have more time for advise.” from Michael J. Black.
    - It was an pressure and frustrated moment before dealine, talking about push for CVPR or postpone to SIGGRAPH.
  • “I could personally support you $200 and will aslo ask others for support you as well.” from Naoya Iwamoto.
    - Happend when company do not have the budget for conference in the third time.
  • “If you need more 8*A100, you can use google cloud and I approval that.” from Takafumi Taketomi.
    - Under the limited budget and numbers of GPUs in AI Lab, he still gives the maximum flexibility for research.