IdealMask

IdealMask separates sources using the ideal binary or soft mask from ground truth. It accepts a list of audio_signal.AudioSignal objects, each of which contains a known source, and applies a mask to a time-frequency representation of the input mixture created from each of the known sources. This is often used as an upper bound on source separation performance when benchmarking new algorithms, as it represents the best possible scenario for mask-based methods.

At the time of this writing, the time-frequency representation used by this class is the magnitude spectrogram.

This class is derived from separation.mask_separation_base.MaskSeparationBase so its run() method returns a list of separation.masks.mask_base.MaskBase objects.

class nussl.separation.ideal_mask.IdealMask(input_audio_mixture, sources_list, power=1, split_zeros=False, binary_db_threshold=20, mask_type='soft', use_librosa_stft=False)

Bases: nussl.separation.mask_separation_base.MaskSeparationBase

Parameters:
  • input_audio_mixture (audio_signal.AudioSignal) – Input audio_signal.AudioSignal mixture to create the masks from.
  • sources_list (list) – List of audio_signal.AudioSignal objects where each one represents an isolated source in the mixture.
  • mask_type (str, Optional) – Indicates whether to make a binary or soft mask. Optional, defaults to SOFT_MASK.
  • use_librosa_stft (bool, Optional) – Whether to use librosa’s STFT function. Optional, defaults to config settings.
sources

List of audio_signal.AudioSignal objects from __init__() where each object represents a single isolated sources within the mixture.

Type:list
estimated_masks

List of resultant separation.masks.mask_base.MaskBase objects created. Masks in this list are in the same order that source_list (and sources) is in.

Type:list
estimated_sources

List of audio_signal.AudioSignal objects created from applying the created masks to the mixture.

Type:list
Raises:ValueError – If not all items in sources_list are audio_signal.AudioSignal objects, OR if not all of the audio_signal.AudioSignal objects in sources_list have the same sample rate and number of channels, OR if input_audio_mixture has a different sample rate or number of channels as the audio_signal.AudioSignal objects in sources_list.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import nussl
import os

path_to_drums = os.path.join('demo_files', 'drums.wav')
path_to_flute = os.path.join('demo_files', 'flute.wav')

drums = nussl.AudioSignal(path_to_drums)
drums.to_mono(overwrite=True)  # make it mono

flute = nussl.AudioSignal(path_to_flute)
flute.truncate_samples(drums.signal_length)  # shorten the flute solo

# Make a mixture and increase the gain on the flute
mixture = drums + flute * 3.0

# Run IdealMask making binary masks
ideal_mask = nussl.IdealMask(mixture, [drums, flute], mask_type=nussl.BinaryMask)
ideal_mask.run()
ideal_drums, ideal_flute = ideal_mask.make_audio_signals()
ideal_residual = ideal_mask.residual  # Left over audio that was not captured by the mask
run()

Creates a list of masks (as separation.masks.mask_base.MaskBase objects, either separation.masks.binary_mask.BinaryMask or separation.masks.soft_mask.SoftMask depending on how the object was instantiated) from a list of known source signals (source_list in the constructor).

Returns a list of separation.masks.mask_base.MaskBase objects (one for each input signal) in the order that they were provided when this object was initialized.

Binary masks are created based on the magnitude spectrogram using the following formula:

mask = (provided_source.mag_spec >= (mixture_mag_spec - provided_source.mag_spec) mask = (20 * np.log10(source.mag_spec / mixture.mag_spec)) > binary_db_threshold

Where ‘/’ is a element-wise division and ‘>’ is element-wise logical greater-than.

Soft masks are also created based on the magnitude spectrogram but use the following formula:

  1. mask = mixture_mag_spec / provided_source.mag_spec
  2. mask = log(mask)
  3. mask = (mask + abs(min(mask))) / max(mask)

Where all arithmetic operations and log are element-wise. This provides a logarithmically scaled mask that is in the interval [0.0, 1.0].

Returns:estimated_masks (list) – List of resultant separation.masks.mask_base.MaskBase objects created. Masks in this list are in the same order that source_list (and sources) are in.
Raises:RuntimeError if unknown mask type is provided (Options are [BinaryMask, or SoftMask]).
residual

This is an audio_signal.AudioSignal object that contains the left over audio that was not captured by creating the masks. The residual is calculated in the time domain; after all of the masks are created by running run() and making the corresponding audio_signal.AudioSignal objects (using make_audio_signals() which applies the masks to the mixture stft and does an istft for each source from the calculated masks), the residual is simply the original mixture with

Returns:

residual (audio_signal.AudioSignal) – audio_signal.AudioSignal object that contains the left over audio that was not captured by creating the masks.

Raises:
  • * ValueError if run() has not been called. OR
  • * Exception if there was an unforeseen issue.
make_audio_signals()

Returns a list of signals (as audio_signal.AudioSignal objects) created by applying the ideal masks. This creates the signals by element-wise multiply the masks with the mixture stft. Prior to running this, it is expected that run() has been called or else this will throw an error. These of signals is in the same order as the input ideal mixtures were when they were input (as a parameter to the constructor, sources_list or the sources attribute).

Returns:estimated_sources (list) – List of audio_signal.AudioSignal objects that represent the sources created by applying a mask from the known source to the mixture

Example:

1
2
3
ideal_mask = nussl.IdealMask(mixture, [drums, flute], mask_type=nussl.BinaryMask)
ideal_mask.run()
ideal_drums, ideal_flute = ideal_mask.make_audio_signals()
BINARY_MASK = 'binary'
SOFT_MASK = 'soft'
audio_signal

Copy of the audio_signal.AudioSignal object passed in upon initialization.

Type:(audio_signal.AudioSignal)
classmethod from_json(json_string)

Creates a new SeparationBase object from the parameters stored in this JSON string.

Parameters:json_string (str) – A JSON string containing all the data to create a new SeparationBase object.
Returns:(SeparationBase) A new SeparationBase object from the JSON string.

See also

to_json() to make a JSON string to freeze this object.

mask_threshold

PROPERTY

Threshold of determining True/False if mask_type is BINARY_MASK. Some algorithms will first make a soft mask and then convert that to a binary mask using this threshold parameter. All values of the soft mask are between [0.0, 1.0] and as such mask_threshold() is expected to be a float between [0.0, 1.0].

Returns:mask_threshold (float) – Value between [0.0, 1.0] that indicates the True/False cutoff when converting a soft mask to binary mask.
Raises:ValueError if not a float or if set outside [0.0, 1.0].
mask_type

PROPERTY

This property indicates what type of mask the derived algorithm will create and be returned by run(). Options are either ‘soft’ or ‘binary’. mask_type is usually set when initializing a MaskSeparationBase-derived class and defaults to SOFT_MASK.

This property, though stored as a string, can be set in two ways when initializing:

  • First, it is possible to set this property with a string. Only 'soft' and 'binary' are accepted (case insensitive), every other value will raise an error. When initializing with a string, two helper attributes are provided: BINARY_MASK and SOFT_MASK.

    It is HIGHLY encouraged to use these, as the API may change and code that uses bare strings (e.g. mask_type = 'soft' or mask_type = 'binary') for assignment might not be future-proof. BINARY_MASK` and SOFT_MASK are safe aliases in case these underlying types change.

  • The second way to set this property is by using a class prototype of either the separation.masks.binary_mask.BinaryMask or separation.masks.soft_mask.SoftMask class prototype. This is probably the most stable way to set this, and it’s fairly succinct. For example, mask_type = nussl.BinaryMask or mask_type = nussl.SoftMask are both perfectly valid.

Though uncommon, this can be set outside of __init__()

Examples of both methods are shown below.

Returns:mask_type (str) – Either 'soft' or 'binary'.
Raises:ValueError if set invalidly.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import nussl
mixture_signal = nussl.AudioSignal()

# Two options for determining mask upon init...

# Option 1: Init with a string (BINARY_MASK is a string 'constant')
repet_sim = nussl.RepetSim(mixture_signal, mask_type=nussl.MaskSeparationBase.BINARY_MASK)

# Option 2: Init with a class type
ola = nussl.OverlapAdd(mixture_signal, mask_type=nussl.SoftMask)

# It's also possible to change these values after init by changing the `mask_type` property...
repet_sim.mask_type = nussl.MaskSeparationBase.SOFT_MASK  # using a string
ola.mask_type = nussl.BinaryMask  # or using a class type
ones_mask(shape)
Parameters:shape

Returns:

plot(output_name, **kwargs)

Plots relevant data for mask-based separation algorithm. Base class: Do not call directly!

Raises:NotImplementedError – Cannot call base class!
sample_rate

Sample rate of audio_signal. Literally audio_signal.sample_rate.

Type:(int)
stft_params

spectral_utils.StftParams of audio_signal Literally audio_signal.stft_params.

Type:(spectral_utils.StftParams)
to_json()

Outputs JSON from the data stored in this object.

Returns:(str) a JSON string containing all of the information to restore this object exactly as it was when this was called.

See also

from_json() to restore a JSON frozen object.

zeros_mask(shape)

Creates a new zeros mask with this object’s type

Parameters:shape

Returns: