Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching

Diverse Character Styles

Livatar supports a wide variety of character styles, including anime, cartoon, photorealistic, and oil painting aesthetics, among others.

Diverse Facial Orientations

Livatar supports the animation of reference images with diverse facial orientations, including individuals facing forward, in profile, or with head turns.

Diverse Characters Ages

Livatar supports character animations across a wide range of ages and genders, including elderly, middle-aged, young adults, and adolescents.

We present Livatar, a real-time audio-driven talking heads videos generation framework. Existing baselines suffer from limited lip-sync accuracy and long-term pose drift. We address these limitations with a flow matching based framework. Coupled with system optimizations, Livatar achieves competitive lip-sync quality with a 8.50 LipSync Confidence on the HDTF dataset, and reaches a throughput of 141 FPS with an end-to-end latency of 0.17s on a single A10 GPU. This makes high-fidelity avatars accessible to broader applications. Our project is available at https://www.hedra.com/