Predicting Algorithm Efficacy for Adaptive Multi-Cue Source Separation

Ethan Manilow, Prem Seetharaman, Fatemeh Pishdadian, Bryan Pardo

This page shows an audio example illustrating multicue source separation.

The original mixture is divided into 2 segments:

  1. In the first 10 seconds, the vocals are panned to the left ear and the backgrounds are centered.
  2. In the last 20 seconds, the vocals are panned to the center and overlap spatially with the background.

Here's the mixture audio.

In [87]:
from audio_embed import utilities
mix = AudioSignal('audio/mix.wav')
In [91]:
from nussl import AlgorithmSwitcher, AudioSignal, RepetSim, Melodia, Projet

separated = {'bg':{}, 'fg':{}}

def separate(mix, approach):
    if approach.__name__ == 'Projet':
        s = approach(mix, num_sources=2, num_iterations=100)
        separated['projet'] =
        s = approach(mix)
        s.high_pass_cutoff = 0
        separated['bg'][approach.__name__], separated['fg'][approach.__name__] = s.make_audio_signals()

approaches = [Melodia, RepetSim, Projet]

for a in approaches:
    separate(mix, a)

Three algorithms are applied to the mixture, each relying on a different cue:

  • MELODIA* (melody tracking using pitch proximity)
  • REPET-SIM (repetition)
  • PROJET (spatialization)

* The published MELODIA is a predominant pitch tracker, not a source separation algorithm, but we use it to build a harmonic mask to separate out the vocals based on the pitch track.

Here are the singing voice estimates output by each algorithm.

In [93]:
print 'PROJET estimate'['projet'][1])
print 'REPET-SIM estimate'['fg']['RepetSim'])
print 'MELODIA estimate'['fg']['Melodia'])
PROJET estimate