Headed by Prof. Bryan Pardo, the Interactive Audio Lab is in the Computer Science Department of Northwestern University. We develop new methods in Machine Learning, Signal Processing and Human Computer Interaction to make new tools for understanding and manipulating sound.
Ongoing research in the lab is applied to audio scene labeling, audio source separation, inclusive interfaces, new audio production tools and machine audition models that learn without supervision. For more see our projects page.
Latest News
New papers at Interspeech and ISMIR
Sep 1, 2024
Max Morrison defends dissertation
Jun 1, 2024
Three papers accepted to ICASSP.
Feb 1, 2024
TorchCrepe pitch tracker: 2,000,0000+ downloads.
Oct 18, 2023
Try the VampNet demo: Music Generation via Masked Acoustic Token Modeling.
Jul 1, 2023
Our tech inside Adobe's new AI-powered audio editor
Jun 1, 2023
$440K grant from NSF: Engaging Blind and Visually Impaired Youth in Computer Science through Music Programming
Jun 1, 2023
Lexie B2 hearing aids use our tech
Dec 1, 2022
$100K grant from Sony to fund speech generation
Oct 1, 2022
$1.8 million Future of Work award from NSF
Sep 12, 2022
Projects
-
MaskMark - Robust Neural Watermarking for Real and Synthetic Speech
Patrick O'Reilly, Zeyu Jin, Jiaqi Su, Bryan Pardo
High-quality speech synthesis models may be used to spread misinformation or impersonate voices. Audio watermarking can combat misuse by embedding a traceable signature in generated audio. However, existing audio watermarks typically demonstrate robustness to only a small set of transformations of the watermarked audio. To address this, we propose MaskMark, a neural network-based digital audio watermarking technique optimized for speech.
-
VampNet - Music Generation via Masked Acoustic Token Modeling
Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo
We introduce VampNet, a masked acoustic token modeling approach to music audio generation. VampNet lets us sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. Prompting VampNet appropriately, enables music compression, inpainting, outpainting, continuation, and looping with variation (vamping). This makes VampNet a powerful music co-creation tool.
-
Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models
Patrick O'Reilly, Andreas Bugler, Keshav Bhandari, Max Morrison, Bryan Pardo
As governments and corporations adopt deep learning systems to apply voice ID at scale, concerns about security and privacy naturally emerge. We propose a neural network model capable of inperceptibly modifying a user’s voice in real-time to prevent speaker recognition from identifying their voce.