Woosh: A Sound Effects Foundation Model

Hadjeres, Gaetan; Ferras, Marc; Koutini, Khaled; Weck, Benno; Bittar, Alexandre; Hummel, Thomas; Lahrichi, Zineb; Missoum, Hakim; Serrà, Joan; Mitsufuji, Yuki

Woosh - A Sound Effect Foundation Model

Gaetan Hadjeres¹, Marc Ferras¹, Khaled Koutini¹, Benno Weck¹, Alexandre Bittar¹, Thomas Hummel¹, Zineb Lahrichi¹, Hakim Missoum¹, Joan Serrà¹, Yuki Mitsufuji^1,2,

¹Sony AI
²Sony Group Corporation

Code arXiv

Woosh-Flow Private: Text-to-audio Generation

We compare our private Woosh-Flow model against Woosh-Flow Public and other baselines.

Sample 1: Electric coffee grinder grinding beans for single cup (Spectrograms)

SAO

TangoFlux

Woosh-Flow-Public

Woosh-Flow-Private

Sample 2: Strike on a timpani while using the pitch pedal (Spectrograms)

SAO

TangoFlux

Woosh-Flow-Public

Woosh-Flow-Private

Sample 3: Water is splashing while paddling from a kayak (Spectrograms)

SAO

TangoFlux

Woosh-Flow-Public

Woosh-Flow-Private

Sample 4: Turning the crank of a pencil sharpener machine without a pencil inside (Spectrograms)

SAO

TangoFlux

Woosh-Flow-Public

Woosh-Flow-Private

Sample 5: Dark computer voice saying hi (Spectrograms)

SAO

TangoFlux

Woosh-Flow-Public

Woosh-Flow-Private

BibTeX

@misc{hadjeres2026,
   title={Woosh: A Sound Effects Foundation Model},
   author={Gaetan Hadjeres, Marc Ferras, Khaled Koutini, Benno Weck, Alexandre Bittar, Thomas Hummel, Zineb Lahrichi, Hakim Missoum, Joan Serrà and Yuki Mitsufuji},
   year={2026},
   eprint={2604.01929},
   archivePrefix={arXiv},
   primaryClass={cs.SD},
   url={https://arxiv.org/abs/2604.01929},
   }