The first open-source model that generates video and audio in a single unified pass. 7 languages, 8-step inference, cinema-grade 2K output.
Drag image here or describe your idea
The first open-source model to unify video and audio generation — setting new standards for the industry.
15B-parameter transformer generates video and audio jointly in one pass — no post-processing needed.
Elo 1333-1406, surpassing Seedance 2.0 by ~60 points in blind user voting.
DMD-2 distillation eliminates Classifier-Free Guidance. Single-GPU deployment now possible.
Mandarin, Cantonese, English, Japanese, Korean, German, French — word error rate only 14.60%.
Cinema-grade quality up to 2K. Built-in super-resolution module for further upscaling.
Base model, distilled model, super-resolution module, and inference code — all released.
Drag an image or type your video idea
AI creates video + audio in one pass
Get high-quality output instantly
Flexible plans for creators. No hidden fees.
For light and occasional use
Built for professional creators