Duet

The DUET algorithm was originally proposed by S.Rickard and F.Dietrich for DOA estimation and further developed for BSS and demixing by A.Jourjine, S.Rickard, and O. Yilmaz. DUET extracts sources using the symmetric attenuation and relative delay between two channels. The symmetric attenuation is calculated from the ratio of the two channels’ stft amplitudes, and the delay is the arrival delay between the two sensors used to record the audio signal. These two values are clustered as peaks on a histogram to determine where each source occurs. This implementation of DUET creates and returns Mask objects after the run() function, which can then be applied to the original audio signal to extract each individual source.

References

  • Rickard, Scott. “The DUET blind source separation algorithm.” Blind Speech Separation. Springer Netherlands, 2007. 217-241.
  • Yilmaz, Ozgur, and Scott Rickard. “Blind separation of speech mixtures via time-frequency masking.” Signal Processing, IEEE transactions on 52.7 (2004): 1830-1847.
param input_audio_signal:
 a 2-row Numpy matrix containing samples of the two-channel mixture.
type input_audio_signal:
 np.array
param num_sources:
 Number of sources to find.
type num_sources:
 int
param attenuation_min:
 Minimum distance in utils.find_peak_indices, change if not enough peaks are identified.
type attenuation_min:
 int
param attenuation_max:
 Used for creating a histogram without outliers.
type attenuation_max:
 int
param num_attenuation_bins:
 Number of bins for attenuation.
type num_attenuation_bins:
 int
param delay_min:
 Lower bound on delay, used as minimum distance in utils.find_peak_indices.
type delay_min:int
param delay_max:
 Upper bound on delay, used for creating a histogram without outliers.
type delay_max:int
param num_delay_bins:
 Number of bins for delay.
type num_delay_bins:
 int
param peak_threshold:
 Value in [0, 1] for peak picking.
type peak_threshold:
 float
param attenuation_min_distance:
 Minimum distance between peaks wrt attenuation.
type attenuation_min_distance:
 int
param delay_min_distance:
 Minimum distance between peaks wrt delay.
type delay_min_distance:
 int
param p:Weight the histogram with the symmetric attenuation estimator.
type p:int
param q:Weight the histogram with the delay estimator
type q:int
param On page 8 of his paper, Rickard recommends p=1 and q=0 as a default starting point and p=.5, q=0 if one:
 
param source is more dominant.:
 
nussl.separation.Duet.stft_ch0

A Numpy matrix containing the stft data of channel 0.

Type:np.array
nussl.separation.Duet.stft_ch1

A Numpy matrix containing the stft data of channel 1.

Type:np.array
nussl.separation.Duet.frequency_matrix

A Numpy matrix containing the frequencies of analysis.

Type:np.array
nussl.separation.Duet.symmetric_atn

A Numpy matrix containing the symmetric attenuation between the two channels.

Type:np.array
nussl.separation.Duet.delay

A Numpy matrix containing the delay between the two channels.

Type:np.array
nussl.separation.Duet.num_time_bins

The number of time bins for the frequency matrix and mask arrays.

Type:np.array
nussl.separation.Duet.num_frequency_bins

The number of frequency bins for the mask arrays.

Type:int
nussl.separation.Duet.attenuation_bins

A Numpy array containing the attenuation bins for the histogram.

Type:int
nussl.separation.Duet.delay_bins

A Numpy array containing the delay bins for the histogram.

Type:np.array
nussl.separation.Duet.normalized_attenuation_delay_histogram

A normalized Numpy matrix containing the attenuation delay histogram, which has peaks for each source.

Type:np.array
nussl.separation.Duet.attenuation_delay_histogram

A non-normalized Numpy matrix containing the attenuation delay histogram, which has peaks for each source.

Type:np.array
nussl.separation.Duet.peak_indices

A Numpy array containing the indices of the peaks for the histogram.

Type:np.array
nussl.separation.Duet.separated_sources

A Numpy array of arrays containing each separated source.

Type:np.array

Examples

The DUET Demo Example

nussl.separation.Duet.audio_signal

Copy of the audio_signal.AudioSignal object passed in upon initialization.

Type:(audio_signal.AudioSignal)
nussl.separation.Duet.mask_threshold

PROPERTY

Threshold of determining True/False if mask_type is BINARY_MASK. Some algorithms will first make a soft mask and then convert that to a binary mask using this threshold parameter. All values of the soft mask are between [0.0, 1.0] and as such mask_threshold() is expected to be a float between [0.0, 1.0].

Returns:mask_threshold (float) – Value between [0.0, 1.0] that indicates the True/False cutoff when converting a soft mask to binary mask.
Raises:ValueError if not a float or if set outside [0.0, 1.0].
nussl.separation.Duet.mask_type

PROPERTY

This property indicates what type of mask the derived algorithm will create and be returned by run(). Options are either ‘soft’ or ‘binary’. mask_type is usually set when initializing a MaskSeparationBase-derived class and defaults to SOFT_MASK.

This property, though stored as a string, can be set in two ways when initializing:

  • First, it is possible to set this property with a string. Only 'soft' and 'binary' are accepted (case insensitive), every other value will raise an error. When initializing with a string, two helper attributes are provided: BINARY_MASK and SOFT_MASK.

    It is HIGHLY encouraged to use these, as the API may change and code that uses bare strings (e.g. mask_type = 'soft' or mask_type = 'binary') for assignment might not be future-proof. BINARY_MASK` and SOFT_MASK are safe aliases in case these underlying types change.

  • The second way to set this property is by using a class prototype of either the separation.masks.binary_mask.BinaryMask or separation.masks.soft_mask.SoftMask class prototype. This is probably the most stable way to set this, and it’s fairly succinct. For example, mask_type = nussl.BinaryMask or mask_type = nussl.SoftMask are both perfectly valid.

Though uncommon, this can be set outside of __init__()

Examples of both methods are shown below.

Returns:mask_type (str) – Either 'soft' or 'binary'.
Raises:ValueError if set invalidly.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import nussl
mixture_signal = nussl.AudioSignal()

# Two options for determining mask upon init...

# Option 1: Init with a string (BINARY_MASK is a string 'constant')
repet_sim = nussl.RepetSim(mixture_signal, mask_type=nussl.MaskSeparationBase.BINARY_MASK)

# Option 2: Init with a class type
ola = nussl.OverlapAdd(mixture_signal, mask_type=nussl.SoftMask)

# It's also possible to change these values after init by changing the `mask_type` property...
repet_sim.mask_type = nussl.MaskSeparationBase.SOFT_MASK  # using a string
ola.mask_type = nussl.BinaryMask  # or using a class type
nussl.separation.Duet.sample_rate

Sample rate of audio_signal. Literally audio_signal.sample_rate.

Type:(int)
nussl.separation.Duet.stft_params

spectral_utils.StftParams of audio_signal Literally audio_signal.stft_params.

Type:(spectral_utils.StftParams)