Woosh-Flow Private: Text-to-audio Generation
We compare our private Woosh-Flow model against Woosh-Flow Public and other baselines.
Sample 1: Electric coffee grinder grinding beans for single cup (Spectrograms)
Sample 2: Strike on a timpani while using the pitch pedal (Spectrograms)
Sample 3: Water is splashing while paddling from a kayak (Spectrograms)
Sample 4: Turning the crank of a pencil sharpener machine without a pencil inside (Spectrograms)
Sample 5: Dark computer voice saying hi (Spectrograms)
BibTeX
@misc{hadjeres2026,
title={Woosh: A Sound Effects Foundation Model},
author={Gaetan Hadjeres, Marc Ferras, Khaled Koutini, Benno Weck, Alexandre Bittar, Thomas Hummel, Zineb Lahrici, Hakim Missoum, Joan Serrà and Yuki Mitsufuji},
year={2026},
eprint={2412.15322},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.15322},
}