The original mixture is divided into 2 segments:
Here's the mixture audio.
from audio_embed import utilities
utilities.apply_style()
mix = AudioSignal('audio/mix.wav')
utilities.audio(mix)
from nussl import AlgorithmSwitcher, AudioSignal, RepetSim, Melodia, Projet
separated = {'bg':{}, 'fg':{}}
def separate(mix, approach):
if approach.__name__ == 'Projet':
s = approach(mix, num_sources=2, num_iterations=100)
separated['projet'] = s.run()
else:
s = approach(mix)
s.high_pass_cutoff = 0
s.run()
separated['bg'][approach.__name__], separated['fg'][approach.__name__] = s.make_audio_signals()
approaches = [Melodia, RepetSim, Projet]
for a in approaches:
separate(mix, a)
* The published MELODIA is a predominant pitch tracker, not a source separation algorithm, but we use it to build a harmonic mask to separate out the vocals based on the pitch track.
print 'PROJET estimate'
utilities.audio(separated['projet'][1])
print 'REPET-SIM estimate'
utilities.audio(separated['fg']['RepetSim'])
print 'MELODIA estimate'
utilities.audio(separated['fg']['Melodia'])
The PROJET estimate is very good for the first 10 seconds, when the vocals are spatially separated. Afterwards, separation fails, resulting in no audio from 10 seconds to the end. The MELODIA and REPET-SIM estimates have comparable performance. Ideally, we would use PROJET for the first 10 seconds of this mixture and then use a combination of MELODIA and REPET-SIM for the last 20 seconds of the mixture.
Below is the output of our system, which predicts SDR for each algorithm output for 1 second chunks and uses the algorithm that has the highest predicted SDR in that 1-second chunk. Our system correctly uses PROJET for the first 10 seconds of the mixture and then switches between REPET-SIM and MELODIA for the remainder of the mixture, predicting them to have comparable SDR.
separated['fg']['Projet'] = separated['projet'][1]
import matplotlib
font = {'family' : 'normal',
'weight' : 'normal',
'size' : 18}
matplotlib.rc('font', **font)
import nussl
reload(nussl)
from audio_embed import utilities
import matplotlib.pyplot as plt
switcher = AlgorithmSwitcher(mix,
[separated['fg'][a.__name__] for a in approaches], [a.__name__ for a in approaches],
model = '/home/prem/research/nussl/nussl/separation/models/vocal_sdr_predictor.model')
bg_s, fg_s = switcher.run()
print 'Proposed estimate: Using the algorithm with the best SDR, calculated every 1 second.'
utilities.audio((fg_s))
plt.figure(figsize=(20, 10))
plt.subplot(211)
switcher.plot(None)
plt.title('Predicted algorithm (best and worst) over time in the mixture')
plt.subplot(212)
for x in switcher.sdrs.T:
plt.plot(x)
plt.xlim([0, 30])
plt.title('SDR estimate over time for each algorithm')
plt.legend([a.__name__ for a in approaches])
plt.tight_layout()
plt.show()
from IPython.display import HTML
HTML('''<script>
code_show=true;
function code_toggle() {
if (code_show){
$('div.input').hide();
} else {
$('div.input').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>''')
#<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')