Skip to main content

VampNet - Music Generation via Masked Acoustic Token Modeling

System description

Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo

This work supported by NSF Award Number 2222369


We introduce VampNet, a masked acoustic token modeling approach to music audio generation. VampNet lets us sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. Prompting VampNet appropriately, enables music compression, inpainting, outpainting, continuation, and looping with variation (vamping). This makes VampNet a powerful music co-creation tool.

VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes (compared to hundreds in the autoregressive approach), VampNet generates coherent high-fidelity musical waveforms. For more, try the demo listen to our audio examples, read the paper, peruse the sourcecode, or just watch the video.

Demo Audio Sourcecode

[pdf] H. Flores, P. P. Seetharaman, R. Kumar, and B. Pardo, “VampNet: Music Generation via Masked Acoustic Token Modeling,” International Society of the Society of Music Information Retrieval (ISMIR), 2023.