Woosh - A Sound Effect Foundation Model

Gaetan Hadjeres1, Marc Ferras1, Khaled Koutini1, Benno Weck1, Alexandre Bittar1, Thomas Hummel1, Zineb Lahrichi1, Hakim Missoum1, Joan Serrà1, Yuki Mitsufuji1,2,
1Sony AI
2Sony Group Corporation

Woosh-Flow Private: Text-to-audio Generation

We compare our private Woosh-Flow model against Woosh-Flow Public and other baselines.

Sample 1: Electric coffee grinder grinding beans for single cup (Spectrograms)
SAO
 
TangoFlux
 
Woosh-Flow-Public
 
Woosh-Flow-Private


Sample 2: Strike on a timpani while using the pitch pedal (Spectrograms)
SAO
 
TangoFlux
 
Woosh-Flow-Public
 
Woosh-Flow-Private


Sample 3: Water is splashing while paddling from a kayak (Spectrograms)
SAO
 
TangoFlux
 
Woosh-Flow-Public
 
Woosh-Flow-Private


Sample 4: Turning the crank of a pencil sharpener machine without a pencil inside (Spectrograms)
SAO
 
TangoFlux
 
Woosh-Flow-Public
 
Woosh-Flow-Private


Sample 5: Dark computer voice saying hi (Spectrograms)
SAO
 
TangoFlux
 
Woosh-Flow-Public
 
Woosh-Flow-Private


BibTeX

@misc{hadjeres2026,
   title={Woosh: A Sound Effects Foundation Model},
   author={Gaetan Hadjeres, Marc Ferras, Khaled Koutini, Benno Weck, Alexandre Bittar, Thomas Hummel, Zineb Lahrici, Hakim Missoum, Joan Serrà and Yuki Mitsufuji},
   year={2026},
   eprint={2412.15322},
   archivePrefix={arXiv},
   primaryClass={cs.CV},
   url={https://arxiv.org/abs/2412.15322},
   }