Music/Voice Separation Using the 2D Fourier Transform

Prem Seetharaman, Fatemeh Pishdadian, Bryan Pardo

This page shows a few audio examples for a source separation approach based on the 2D Fourier Transform.

Audio source separation is the act of isolating sound sources in an audio scene. One application of source separation is singing voice extraction. In this work, we present a novel approach for music/voice separation that uses the 2D Fourier Transform (2DFT). Our approach leverages how periodic patterns manifest in the 2D Fourier Transform and is connected to research in biological auditory systems as well as image processing. We find that our system is very simple to describe and implement and competitive with existing unsupervised source separation approaches that leverage similar assumptions.

In [8]:
from nussl import AudioSignal, FT2D
from audio_embed import utilities
import matplotlib.pyplot as plt
%matplotlib inline

mix = AudioSignal('audio/blondiecallme.mp3')
separator = FT2D(mix)
bg, fg = separator.make_audio_signals()
plt.figure(figsize=(20, 10))

utilities.multitrack([(fg, 'Foreground'), (bg,  'Background')])