Skip to main content

Headed by Prof. Bryan Pardo, the Interactive Audio Lab is in the Computer Science Department of Northwestern University. We develop new methods in Generative Modeling, Signal Processing and Human Computer Interaction to make new tools for understanding, creating, and manipulating sound.

Ongoing research in the lab is applied to generation of music and speech, audio scene labeling, audio source separation, inclusive interfaces, new audio production tools and machine audition models that learn without supervision. For more see our projects page.


Projects

  • System description

    Sketch2Sound - Controllable Audio Generation via Time-Varying Signals and Sonic Imitations

    Hugo Flores Garcia, Oriol Nieto, Justin Salamon, Bryan Pardo, Prem Seetharaman

    In collaboration with Adobe, we present Sketch2Sound, a generative audio model capable of creating high-quality sounds from a set of interpretable time-varying control signals: loudness, brightness, and pitch, as well as text prompts. Sketch2Sound can synthesize arbitrary sounds from sonic imitations (i.e., a vocal imitation or a reference sound-shape).

  • System description

    Text2FX - Harnessing CLAP Embeddings for Text-Guided Audio Effects

    Annie Chu, Patrick O'Reilly, Julia Barnett, Bryan Pardo

    Text2FX leverages CLAP embeddings and differentiable digital signal processing to control audio effects, such as equalization and reverberation, using open-vocabulary natural language prompts (e.g., “make this sound in-your-face and bold”).

  • MaskMark

    MaskMark - Robust Neural Watermarking for Real and Synthetic Speech

    Patrick O'Reilly, Zeyu Jin, Jiaqi Su, Bryan Pardo

    High-quality speech synthesis models may be used to spread misinformation or impersonate voices. Audio watermarking can combat misuse by embedding a traceable signature in generated audio. However, existing audio watermarks typically demonstrate robustness to only a small set of transformations of the watermarked audio. To address this, we propose MaskMark, a neural network-based digital audio watermarking technique optimized for speech.

Full List of Projects