Duet¶
The DUET algorithm was originally proposed by S.Rickard and F.Dietrich for DOA estimation and further developed for BSS and demixing by A.Jourjine, S.Rickard, and O. Yilmaz. DUET extracts sources using the symmetric attenuation and relative delay between two channels. The symmetric attenuation is calculated from the ratio of the two channels’ stft amplitudes, and the delay is the arrival delay between the two sensors used to record the audio signal. These two values are clustered as peaks on a histogram to determine where each source occurs. This implementation of DUET creates and returns Mask objects after the run() function, which can then be applied to the original audio signal to extract each individual source.
References
 Rickard, Scott. “The DUET blind source separation algorithm.” Blind Speech Separation. Springer Netherlands, 2007. 217241.
 Yilmaz, Ozgur, and Scott Rickard. “Blind separation of speech mixtures via timefrequency masking.” Signal Processing, IEEE transactions on 52.7 (2004): 18301847.
param input_audio_signal:  

a 2row Numpy matrix containing samples of the twochannel mixture.  
type input_audio_signal:  
np.array  
param num_sources:  
Number of sources to find.  
type num_sources:  
int  
param attenuation_min:  
Minimum distance in utils.find_peak_indices, change if not enough peaks are identified.  
type attenuation_min:  
int  
param attenuation_max:  
Used for creating a histogram without outliers.  
type attenuation_max:  
int  
param num_attenuation_bins:  
Number of bins for attenuation.  
type num_attenuation_bins:  
int  
param delay_min:  
Lower bound on delay, used as minimum distance in utils.find_peak_indices.  
type delay_min:  int 
param delay_max:  
Upper bound on delay, used for creating a histogram without outliers.  
type delay_max:  int 
param num_delay_bins:  
Number of bins for delay.  
type num_delay_bins:  
int  
param peak_threshold:  
Value in [0, 1] for peak picking.  
type peak_threshold:  
float  
param attenuation_min_distance:  
Minimum distance between peaks wrt attenuation.  
type attenuation_min_distance:  
int  
param delay_min_distance:  
Minimum distance between peaks wrt delay.  
type delay_min_distance:  
int  
param p:  Weight the histogram with the symmetric attenuation estimator. 
type p:  int 
param q:  Weight the histogram with the delay estimator 
type q:  int 
param On page 8 of his paper, Rickard recommends p=1 and q=0 as a default starting point and p=.5, q=0 if one:  
param source is more dominant.:  

nussl.separation.Duet.
stft_ch0
¶ A Numpy matrix containing the stft data of channel 0.
Type: np.array

nussl.separation.Duet.
stft_ch1
¶ A Numpy matrix containing the stft data of channel 1.
Type: np.array

nussl.separation.Duet.
frequency_matrix
¶ A Numpy matrix containing the frequencies of analysis.
Type: np.array

nussl.separation.Duet.
symmetric_atn
¶ A Numpy matrix containing the symmetric attenuation between the two channels.
Type: np.array

nussl.separation.Duet.
delay
¶ A Numpy matrix containing the delay between the two channels.
Type: np.array

nussl.separation.Duet.
num_time_bins
¶ The number of time bins for the frequency matrix and mask arrays.
Type: np.array

nussl.separation.Duet.
num_frequency_bins
¶ The number of frequency bins for the mask arrays.
Type: int

nussl.separation.Duet.
attenuation_bins
¶ A Numpy array containing the attenuation bins for the histogram.
Type: int

nussl.separation.Duet.
delay_bins
¶ A Numpy array containing the delay bins for the histogram.
Type: np.array

nussl.separation.Duet.
normalized_attenuation_delay_histogram
¶ A normalized Numpy matrix containing the attenuation delay histogram, which has peaks for each source.
Type: np.array

nussl.separation.Duet.
attenuation_delay_histogram
¶ A nonnormalized Numpy matrix containing the attenuation delay histogram, which has peaks for each source.
Type: np.array

nussl.separation.Duet.
peak_indices
¶ A Numpy array containing the indices of the peaks for the histogram.
Type: np.array

nussl.separation.Duet.
separated_sources
¶ A Numpy array of arrays containing each separated source.
Type: np.array
Examples

nussl.separation.Duet.
audio_signal
¶ Copy of the
audio_signal.AudioSignal
object passed in upon initialization.Type: ( audio_signal.AudioSignal
)

nussl.separation.Duet.
mask_threshold
¶ PROPERTY
Threshold of determining True/False if
mask_type
isBINARY_MASK
. Some algorithms will first make a soft mask and then convert that to a binary mask using this threshold parameter. All values of the soft mask are between[0.0, 1.0]
and as suchmask_threshold()
is expected to be a float between[0.0, 1.0]
.Returns: mask_threshold (float) – Value between [0.0, 1.0]
that indicates the True/False cutoff when converting a soft mask to binary mask.Raises: ValueError if not a float or if set outside [0.0, 1.0]
.

nussl.separation.Duet.
mask_type
¶ PROPERTY
This property indicates what type of mask the derived algorithm will create and be returned by
run()
. Options are either ‘soft’ or ‘binary’.mask_type
is usually set when initializing aMaskSeparationBase
derived class and defaults toSOFT_MASK
.This property, though stored as a string, can be set in two ways when initializing:
First, it is possible to set this property with a string. Only
'soft'
and'binary'
are accepted (case insensitive), every other value will raise an error. When initializing with a string, two helper attributes are provided:BINARY_MASK
andSOFT_MASK
.It is HIGHLY encouraged to use these, as the API may change and code that uses bare strings (e.g.
mask_type = 'soft'
ormask_type = 'binary'
) for assignment might not be futureproof.BINARY_MASK`
andSOFT_MASK
are safe aliases in case these underlying types change.The second way to set this property is by using a class prototype of either the
separation.masks.binary_mask.BinaryMask
orseparation.masks.soft_mask.SoftMask
class prototype. This is probably the most stable way to set this, and it’s fairly succinct. For example,mask_type = nussl.BinaryMask
ormask_type = nussl.SoftMask
are both perfectly valid.
Though uncommon, this can be set outside of
__init__()
Examples of both methods are shown below.
Returns: mask_type (str) – Either 'soft'
or'binary'
.Raises: ValueError if set invalidly. Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
import nussl mixture_signal = nussl.AudioSignal() # Two options for determining mask upon init... # Option 1: Init with a string (BINARY_MASK is a string 'constant') repet_sim = nussl.RepetSim(mixture_signal, mask_type=nussl.MaskSeparationBase.BINARY_MASK) # Option 2: Init with a class type ola = nussl.OverlapAdd(mixture_signal, mask_type=nussl.SoftMask) # It's also possible to change these values after init by changing the `mask_type` property... repet_sim.mask_type = nussl.MaskSeparationBase.SOFT_MASK # using a string ola.mask_type = nussl.BinaryMask # or using a class type

nussl.separation.Duet.
sample_rate
¶ Sample rate of
audio_signal
. Literallyaudio_signal.sample_rate
.Type: (int)

nussl.separation.Duet.
stft_params
¶ spectral_utils.StftParams
ofaudio_signal
Literallyaudio_signal.stft_params
.Type: ( spectral_utils.StftParams
)