MMAudio — Video-to-Audio Synthesis

Project page: https://hkchengrex.com/MMAudio/
Code: https://github.com/hkchengrex/MMAudio

Ho Kei Cheng, Masato Ishii, Akio Hayakawa, Takashi Shibuya, Alexander Schwing, Yuki Mitsufuji

University of Illinois Urbana-Champaign, Sony AI, and Sony Group Corporation

CVPR 2025

NOTE: It takes longer to process high-resolution videos (>384 px on the shorter side). Doing so does not improve results.

The model has been trained on 8-second videos. Using much longer or shorter videos will degrade performance. Around 5s~12s should be fine.

Examples
Prompt Negative prompt Seed (-1: random) Num steps Guidance Strength Duration (sec)
Pages: