Haiyang Liu

haiyangliu1997gmail.com

Hi! I'm currently a final-year PhD student in Information Science and Technology, The University of Tokyo. I'm working on Human Video Generation and Motion Generation, using multi-modal conditions such as speech, text scripts, keypoints and image. My works are priminaly focused on the body, but also include the face. I received my M.E. from Waseda University, in 2020.9, and B.E. from Southeast University, in 2019.9.

During my PhD, I have intern at (time order) Hedra Research, Adobe Research, CyberAgent AI Lab for human video generation, and Huawei Research Tokyo for motion generation.

I'm interested in impact-driven research problems on human video-gen and motion-gen, and have great respect for simple yet effective ideas, such as Associative Embedding, ZeroConv, and Phase AutoEncoder. I'm grateful we stand on the shoulders of giants, such as pretrained models, and hope to contribute back to the community.

(New) Seeking full-time positions starting in 2025.

Selected Publications

Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching

Haiyang Liu* Xiaolin Hong* Xuancheng Yang* Yudi Ruan* Xiang Lian Michael Lingelbach Hongwei Yi Wei Li

Technical Report, 2025

Paper | Project Page | Product

Video Motion Graphs

Haiyang Liu Zhan Xu Fa-Ting Hong HSin-Ping Huang Yi Zhou Yang Zhou

International Conference on Computer Vision, ICCV (Highlight), 2025

Paper | Project Page

TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

Haiyang Liu Xingchao Yang Tomoya Akiyama Yuantian Huang Qiaoge Li Shigeru Kuriyama Takafumi Taketomi

International Conference on Learning Representation, ICLR (Oral), 2025

Huggingface Space | Paper | Project Page | Code | Data

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Mask Audio Gesture Modeling

Haiyang Liu* Zihao Zhu* Giorgio Becherini Yichen Peng Mingyang Su You Zhou Xuefei Zhe Naoya Iwamoto Bo Zheng Michael J. Black

Computer Vision and Pattern Recognition, CVPR, 2024

Huggingface Space | Paper | Project Page | Code | Data

BEAT: A Large Scale Semantic and Emotional Multi Modal Dataset for Conversational Gesture Synthesis

Haiyang Liu Zihao Zhu Naoya Iwamoto Yichen Peng Zhengqing Li You Zhou Elif Bozkurt Bo Zheng

European Conference on Computer Vision, ECCV, 2022

Huggingface Space | Paper | Project Page | Code | Data

DisCo: Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gesture Synthesis

Haiyang Liu Naoya Iwamoto Zihao Zhu Zhengqing Li You Zhou Elif Bozkurt Bo Zheng

ACM Multimedia, ACMMM, 2022

Huggingface Space | Paper | Project Page | Code

Colaborated or Advised Works

Video-Gen: FVHuman
Motion-Gen: GestureLSM, SyncTalker, FastTalker, HQGesture, EGGesture

Intern Experience

25.1 — Now 2025

Hedra Research

real-time human video generation
24.6 — 24.9 2024

Adobe Research – Video AI Lab

multi-modal human video generation, mentor: Yang Zhou
23.9 — 24.5 2023

CyberAgent AI Lab – Computer Graphic Group

co-speech video generation, mentor: Takafumi Taketomi
20.11 — 23.9 2020

Huawei Research Tokyo – Digital Human Lab

co-speech gesture generation, mentor: Naoya Iwamoto

Academic Services

Reviewer: SIGGRAPH, SIGGRAPH Asia, Eurographics, ICLR, CVPR, ICCV, ECCV, ACMMM, WACV; IJCV, TVCG

Thanks

“Take it easy with our papers, you should prioritize things that will impact you over the next few years.” from Yang Zhou.
- Learning from the best mentor who consider the feeling and future of intern and friends.
“I’m perfectly fine either way — you know best. And if postpone, I have more time for advise.” from Michael J. Black.
- It was an pressure and frustrated moment before dealine, talking about push for CVPR or postpone to SIGGRAPH.
“I could personally support you $200 and will aslo ask others for support you as well.” from Naoya Iwamoto.
- Happend when company do not have the budget for conference in the third time.
“If you need more 8*A100, you can use google cloud and I approval that.” from Takafumi Taketomi.
- Under the limited budget and numbers of GPUs in AI Lab, he still gives the maximum flexibility for research.

Research interests

Selected Publications

Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching

Video Motion Graphs

TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Mask Audio Gesture Modeling

BEAT: A Large Scale Semantic and Emotional Multi Modal Dataset for Conversational Gesture Synthesis

DisCo: Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gesture Synthesis

Colaborated or Advised Works

Intern Experience

Academic Services

Thanks