DeepClustering

Deep Clustering Separation Class

class nussl.separation.deep_clustering.DeepClustering(input_audio_signal, mask_type='soft', model_path='/media/ext/models/deep_clustering_vocal_44k_long.model', num_sources=2, num_layers=4, hidden_size=500, max_distance=1, embedding_size=20, num_mels=150, do_mono=False, resample_rate=44100, use_librosa_stft=False, cutoff=-40)

Bases: nussl.separation.mask_separation_base.MaskSeparationBase

Implements deep clustering for source separation, using PyTorch.

Deep clustering is a deep learning approach to source separation. It takes as input a mel-spectrogram representation of an audio mixture. Each time-frequency bin is mapped into an K-dimensional embedding. The model works out so that time-frequency bins that are dominated by different sources map to embeddings that are distant, and bins that are dominated by the same source map to embeddings that are near. The sources are then recovered using K-Means clustering on the embedding space.

References:

Hershey, J. R., Chen, Z., Le Roux, J., & Watanabe, S. (2016, March). Deep clustering: Discriminative embeddings for segmentation and separation. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 31-35). IEEE.

Luo, Y., Chen, Z., Hershey, J. R., Roux, J. L., & Mesgarani, N. (2016). Deep Clustering and Conventional Networks for Music Separation: Stronger Together. arXiv preprint arXiv:1611.06265.

Example

music = AudioSignal(“path/to/input.wav”, offset=45, duration=20)

music.stft_params.window_length = 2048 music.stft_params.hop_length = 512

separation = DeepClustering(music, num_sources = 2) masks = separation.run() sources = separation.make_audio_signals() plt.figure(figsize=(20, 8)) separation.plot() plt.tight_layout() plt.show()

load_model(model_path)

Loads the model at specified path model_path :param model_path:

Returns:

deep_clustering()

Returns:

generate_mask(ch, assignments)

Takes binary Mel spectrogram assignments and generates mask

run()

Returns:

apply_mask(mask)

Applies individual mask and returns audio_signal object

make_audio_signals()
Applies each mask in self.masks and returns a list of audio_signal
objects for each source.
Returns:self.sources (np.array) – An array of audio_signal objects containing each separated source
BINARY_MASK = 'binary'
SOFT_MASK = 'soft'
audio_signal

Copy of the audio_signal.AudioSignal object passed in upon initialization.

Type:(audio_signal.AudioSignal)
classmethod from_json(json_string)

Creates a new SeparationBase object from the parameters stored in this JSON string.

Parameters:json_string (str) – A JSON string containing all the data to create a new SeparationBase object.
Returns:(SeparationBase) A new SeparationBase object from the JSON string.

See also

to_json() to make a JSON string to freeze this object.

mask_threshold

PROPERTY

Threshold of determining True/False if mask_type is BINARY_MASK. Some algorithms will first make a soft mask and then convert that to a binary mask using this threshold parameter. All values of the soft mask are between [0.0, 1.0] and as such mask_threshold() is expected to be a float between [0.0, 1.0].

Returns:mask_threshold (float) – Value between [0.0, 1.0] that indicates the True/False cutoff when converting a soft mask to binary mask.
Raises:ValueError if not a float or if set outside [0.0, 1.0].
mask_type

PROPERTY

This property indicates what type of mask the derived algorithm will create and be returned by run(). Options are either ‘soft’ or ‘binary’. mask_type is usually set when initializing a MaskSeparationBase-derived class and defaults to SOFT_MASK.

This property, though stored as a string, can be set in two ways when initializing:

  • First, it is possible to set this property with a string. Only 'soft' and 'binary' are accepted (case insensitive), every other value will raise an error. When initializing with a string, two helper attributes are provided: BINARY_MASK and SOFT_MASK.

    It is HIGHLY encouraged to use these, as the API may change and code that uses bare strings (e.g. mask_type = 'soft' or mask_type = 'binary') for assignment might not be future-proof. BINARY_MASK` and SOFT_MASK are safe aliases in case these underlying types change.

  • The second way to set this property is by using a class prototype of either the separation.masks.binary_mask.BinaryMask or separation.masks.soft_mask.SoftMask class prototype. This is probably the most stable way to set this, and it’s fairly succinct. For example, mask_type = nussl.BinaryMask or mask_type = nussl.SoftMask are both perfectly valid.

Though uncommon, this can be set outside of __init__()

Examples of both methods are shown below.

Returns:mask_type (str) – Either 'soft' or 'binary'.
Raises:ValueError if set invalidly.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import nussl
mixture_signal = nussl.AudioSignal()

# Two options for determining mask upon init...

# Option 1: Init with a string (BINARY_MASK is a string 'constant')
repet_sim = nussl.RepetSim(mixture_signal, mask_type=nussl.MaskSeparationBase.BINARY_MASK)

# Option 2: Init with a class type
ola = nussl.OverlapAdd(mixture_signal, mask_type=nussl.SoftMask)

# It's also possible to change these values after init by changing the `mask_type` property...
repet_sim.mask_type = nussl.MaskSeparationBase.SOFT_MASK  # using a string
ola.mask_type = nussl.BinaryMask  # or using a class type
ones_mask(shape)
Parameters:shape

Returns:

project_embeddings(num_dimensions)

Does a PCA projection of the embedding space :param num_dimensions:

Returns:

sample_rate

Sample rate of audio_signal. Literally audio_signal.sample_rate.

Type:(int)
stft_params

spectral_utils.StftParams of audio_signal Literally audio_signal.stft_params.

Type:(spectral_utils.StftParams)
to_json()

Outputs JSON from the data stored in this object.

Returns:(str) a JSON string containing all of the information to restore this object exactly as it was when this was called.

See also

from_json() to restore a JSON frozen object.

zeros_mask(shape)

Creates a new zeros mask with this object’s type

Parameters:shape

Returns:

plot(**kwargs)
Plots relevant information for deep clustering onto the active figure,

given by matplotlib.pyplot.figure() outside of this function. The three plots are:

  1. PCA of emeddings onto 2 dimensions for visualization
  2. The mixture mel-spectrogram.
  3. The source assignments of each tf-bin in the mixture spectrogram.
Returns:None