Haiyang Liu
I'm a researcher working on multimodal AI models for video, 3D animation, and robotics. My work primarily focuses on generating human videos/3D animation from multimodal conditions such as text, audio, and image.
At Hedra, I led research on the real-time talking heads model Livatar-1 and the multimodal video generation model Omnia. I worked hands-on across the data, model architecture, and training infrastructure.
Before that, I completed my Ph.D. at
The University of Tokyo, with internships at
Hedra Research,
Adobe Research,
CyberAgent AI Lab, and
Huawei Research Tokyo. I advised 3D animation research in Alaya Lab.
Real-Time Talking Heads Generation with Tailored Flow Matching
Tailored Diffusion Forcing for Streaming Motion Generation
Controllable Human Motion Video Generation from Music and Motion Tags
Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation
Towards Unified Holistic Co-Speech Gesture Generation via Expressive Mask Audio Gesture Modeling