NMF MFCC Class

class nussl.separation.nmf_mfcc.NMF_MFCC(input_audio_signal, num_sources, num_templates=50, num_iterations=50, random_seed=None, distance_measure='euclidean', kmeans_kwargs=None, to_mono=False, mask_type='binary', mfcc_range=(1, 14), n_mfcc=20)

Bases: nussl.separation.mask_separation_base.MaskSeparationBase

Non Negative Matrix Factorization using K-Means Clustering on MFCC (NMF MFCC) is a source separation algorithm that runs Transformer NMF on the magnitude spectrogram of an input audio signal. The templates matrix is then converted to mel-space to reduce the dimensionality. The K means clustering then clusters the converted templates and activations matrices. The dot product of the clustered templates and activations results in a magnitude spectrogram only containing a separated source. This is used to create a Binary Mask object, and the whole process can be applied for each cluster to return a list of Audio Signal objects corresponding to each separated source.

References

Parameters:
  • input_audio_signal (audio_signal.AudioSignal) – The audio_signal.AudioSignal object that NMF MFCC will be run on. This makes a copy of input_audio_signal
  • num_sources (int) – Number of sources to find.
  • num_templates (int) – Number of template vectors to used in NMF. Defaults to 50.
  • num_iterations (int) – The number of iterations to go through in NMF. Defaults to 50.
  • random_seed (int) – The seed to use in the numpy random generator in NMF and KMeans. See code examples below for how this is used. Default uses no seed.
  • distance_measure (str) – The type of distance measure to use in NMF - euclidean or divergence. Defaults to euclidean.
  • kmeans_kwargs (dict) – The kwargs for KMeans parameters. Can be initialized with a dictionary of keys corresponding to parameters in KMeans. See below for an example. Default is None.
  • to_mono (bool) – Converts signal to mono before running algorithm. Defaults to False.
  • mask_type (str) – A soft or binary mask object used for the source separation. Defaults to BinaryMask.
  • mfcc_range (int,list,tuple) – The range of MFCCs used for clustering. See examples below. Defaults to 1:14.
  • n_mfcc (int) – The max number of mfccs to use. Defaults to 20.
input_audio_signal

The audio_signal.AudioSignal object that NMF MFCC will be run on. This makes a copy of input_audio_signal

Type:audio_signal.AudioSignal
clusterer

A scikit-learn KMeans object for clustering the templates and activations.

Type:KMeans
signal_stft

The stft data for the input audio signal.

Type:np.ndarray
labeled_templates

A Numpy array containing the labeled templates columns from the templates matrix for a particular source.

Type:list
sources

A list containing the lists of Audio Signal objects for each source.

Type:list
result_masks

A list containing the lists of Binary Mask objects for each channel.

type:list

Initializing Example:

BINARY_MASK = 'binary'
SOFT_MASK = 'soft'
audio_signal

Copy of the audio_signal.AudioSignal object passed in upon initialization.

Type:(audio_signal.AudioSignal)
classmethod from_json(json_string)

Creates a new SeparationBase object from the parameters stored in this JSON string.

Parameters:json_string (str) – A JSON string containing all the data to create a new SeparationBase object.
Returns:(SeparationBase) A new SeparationBase object from the JSON string.

See also

to_json() to make a JSON string to freeze this object.

mask_threshold

PROPERTY

Threshold of determining True/False if mask_type is BINARY_MASK. Some algorithms will first make a soft mask and then convert that to a binary mask using this threshold parameter. All values of the soft mask are between [0.0, 1.0] and as such mask_threshold() is expected to be a float between [0.0, 1.0].

Returns:mask_threshold (float) – Value between [0.0, 1.0] that indicates the True/False cutoff when converting a soft mask to binary mask.
Raises:ValueError if not a float or if set outside [0.0, 1.0].
mask_type

PROPERTY

This property indicates what type of mask the derived algorithm will create and be returned by run(). Options are either ‘soft’ or ‘binary’. mask_type is usually set when initializing a MaskSeparationBase-derived class and defaults to SOFT_MASK.

This property, though stored as a string, can be set in two ways when initializing:

  • First, it is possible to set this property with a string. Only 'soft' and 'binary' are accepted (case insensitive), every other value will raise an error. When initializing with a string, two helper attributes are provided: BINARY_MASK and SOFT_MASK.

    It is HIGHLY encouraged to use these, as the API may change and code that uses bare strings (e.g. mask_type = 'soft' or mask_type = 'binary') for assignment might not be future-proof. BINARY_MASK` and SOFT_MASK are safe aliases in case these underlying types change.

  • The second way to set this property is by using a class prototype of either the separation.masks.binary_mask.BinaryMask or separation.masks.soft_mask.SoftMask class prototype. This is probably the most stable way to set this, and it’s fairly succinct. For example, mask_type = nussl.BinaryMask or mask_type = nussl.SoftMask are both perfectly valid.

Though uncommon, this can be set outside of __init__()

Examples of both methods are shown below.

Returns:mask_type (str) – Either 'soft' or 'binary'.
Raises:ValueError if set invalidly.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import nussl
mixture_signal = nussl.AudioSignal()

# Two options for determining mask upon init...

# Option 1: Init with a string (BINARY_MASK is a string 'constant')
repet_sim = nussl.RepetSim(mixture_signal, mask_type=nussl.MaskSeparationBase.BINARY_MASK)

# Option 2: Init with a class type
ola = nussl.OverlapAdd(mixture_signal, mask_type=nussl.SoftMask)

# It's also possible to change these values after init by changing the `mask_type` property...
repet_sim.mask_type = nussl.MaskSeparationBase.SOFT_MASK  # using a string
ola.mask_type = nussl.BinaryMask  # or using a class type
ones_mask(shape)
Parameters:shape

Returns:

plot(output_name, **kwargs)

Plots relevant data for mask-based separation algorithm. Base class: Do not call directly!

Raises:NotImplementedError – Cannot call base class!
run()

This function calls TransformerNMF on the magnitude spectrogram of each channel in the input audio signal. The templates and activation matrices returned are clustered using K-Means clustering. These clusters are used to create mask objects for each source. Note: The masks in self.result_masks are not returned in a particular order corresponding to the sources, but they are in the same order for each channel.

Returns:result_masks (list) – A list of MaskBase-derived objects for each source. (to get a list of AudioSignal-derived objects run make_audio_signals())

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
signal = nussl.AudioSignal(path_to_input_file='input_name.wav')

# Set up and run NMF MFCC
nmf_mfcc =  nussl.NMF_MFCC(signal, num_sources=2) # Returns a binary mask by default
masks = nmf_mfcc.run()

# Get audio signals
sources = nmf_mfcc.make_audio_signals()

# Output the sources
for i, source in enumerate(sources):
    output_file_name = str(i) + '.wav'
    source.write_audio_to_file(output_file_name)
sample_rate

Sample rate of audio_signal. Literally audio_signal.sample_rate.

Type:(int)
stft_params

spectral_utils.StftParams of audio_signal Literally audio_signal.stft_params.

Type:(spectral_utils.StftParams)
to_json()

Outputs JSON from the data stored in this object.

Returns:(str) a JSON string containing all of the information to restore this object exactly as it was when this was called.

See also

from_json() to restore a JSON frozen object.

zeros_mask(shape)

Creates a new zeros mask with this object’s type

Parameters:shape

Returns:

make_audio_signals()

Applies each mask in self.masks and returns a list of audio_signal objects for each source.

Returns:self.sources (np.array) – An array of audio_signal objects containing each separated source