Eloi Moliner 1, Marco A. Martínez-Ramírez 2, Junghyun Koo 2, Wei-Hsiang Liao 2, Kin Wai Cheuk 2, Joan Serrà 2, Vesa Välimäki 1, Yuki Mitsufuji 2,3
1 Acoustics Lab, Department of Information and Communications Engineering, Aalto University, Finland
2 Sony AI
3 Sony Group Corporation
| Example | Equal Loudness | E2E-Flow | FxNorm-AutoMix L | MEGAMI I-L (proposed) | Human | Dry Stems | |||
|---|---|---|---|---|---|---|---|---|---|
| Vocals | Bass | Drums | Other | ||||||
| 2 - Rock | |||||||||
| 3 - Dance | |||||||||
| 4 - Disco | |||||||||
| 5 - Country | |||||||||
| 6 - Grunge | |||||||||
| 7 - BritPop |
MEGAMI is a probabilistic model, which allows us to sample different mixing variations. Some examples are shown below.
| Example | ||||
|---|---|---|---|---|
| 2 - Rock | ||||
| 3 - Dance | ||||
| 4 - Disco | ||||
| 5 - Country | ||||
| 6 - Grunge | ||||
| 7 - BritPop |