RepetSim

The REpeating Pattern Extraction Technique using the Similarity Matrix (REPET-SIM)

class nussl.separation.repet_sim.RepetSim(input_audio_signal, similarity_threshold=None, min_distance_between_frames=None, max_repeating_frames=None, high_pass_cutoff=None, do_mono=False, use_librosa_stft=False, matlab_fidelity=False, mask_type='soft', mask_threshold=0.5)

Bases: nussl.separation.mask_separation_base.MaskSeparationBase

Implements the REpeating Pattern Extraction Technique algorithm using the Similarity Matrix (REPET-SIM).

REPET is a simple method for separating the repeating background from the non-repeating foreground in a piece of audio mixture. REPET-SIM is a generalization of REPET, which looks for similarities instead of periodicities.

References

  • Zafar Rafii and Bryan Pardo. “Audio Separation System and Method,” US20130064379 A1, US 13/612,413, March 14, 2013
  • Zafar Rafii and Bryan Pardo. “Music/Voice Separation using the Similarity Matrix,” 13th International Society on Music Information Retrieval, Porto, Portugal, October 8-12, 2012.

Parameters:

Examples

The RepetSim Demo Example

run()

Runs REPET-SIM, a variant of REPET using the cosine similarity matrix to find similar frames to do median filtering.

Returns:

static compute_similarity_matrix(matrix)

Computes the cosine similarity matrix using the cosine similarity for any given input matrix X.

Parameters:matrix (np.array) – 2D matrix containing the magnitude spectrogram of the audio signal
Returns:S (np.array) – 2D similarity matrix
get_similarity_matrix()

Calculates and returns the similarity matrix for the audio file associated with this object

Returns:similarity_matrix (np.array) – similarity matrix for the audio file.

EXAMPLE:

# Set up audio signal
signal = nussl.AudioSignal('path_to_file.wav')

# Set up a RepetSim object
repet_sim = nussl.RepetSim(signal)

# I don't have to run repet to get a similarity matrix for signal
sim_mat = repet_sim.get_similarity_matrix()
make_audio_signals()

Returns the background and foreground audio signals. You must have run Repet.run() prior to calling this function. This function will return None if run() has not been called.

Returns:Audio Signals (List)

2 element list.

  • bkgd: Audio signal with the calculated background track
  • fkgd: Audio signal with the calculated foreground track

Example:

1
2
3
4
5
6
7
8
9
# set up AudioSignal object
signal = nussl.AudioSignal('path_to_file.wav')

# set up and run RepetSim
repet_sim = nussl.RepetSim(signal)
repet_sim.run()

# get audio signals (AudioSignal objects)
background, foreground = repet_sim.make_audio_signals()
BINARY_MASK = 'binary'
SOFT_MASK = 'soft'
audio_signal

Copy of the audio_signal.AudioSignal object passed in upon initialization.

Type:(audio_signal.AudioSignal)
classmethod from_json(json_string)

Creates a new SeparationBase object from the parameters stored in this JSON string.

Parameters:json_string (str) – A JSON string containing all the data to create a new SeparationBase object.
Returns:(SeparationBase) A new SeparationBase object from the JSON string.

See also

to_json() to make a JSON string to freeze this object.

mask_threshold

PROPERTY

Threshold of determining True/False if mask_type is BINARY_MASK. Some algorithms will first make a soft mask and then convert that to a binary mask using this threshold parameter. All values of the soft mask are between [0.0, 1.0] and as such mask_threshold() is expected to be a float between [0.0, 1.0].

Returns:mask_threshold (float) – Value between [0.0, 1.0] that indicates the True/False cutoff when converting a soft mask to binary mask.
Raises:ValueError if not a float or if set outside [0.0, 1.0].
mask_type

PROPERTY

This property indicates what type of mask the derived algorithm will create and be returned by run(). Options are either ‘soft’ or ‘binary’. mask_type is usually set when initializing a MaskSeparationBase-derived class and defaults to SOFT_MASK.

This property, though stored as a string, can be set in two ways when initializing:

  • First, it is possible to set this property with a string. Only 'soft' and 'binary' are accepted (case insensitive), every other value will raise an error. When initializing with a string, two helper attributes are provided: BINARY_MASK and SOFT_MASK.

    It is HIGHLY encouraged to use these, as the API may change and code that uses bare strings (e.g. mask_type = 'soft' or mask_type = 'binary') for assignment might not be future-proof. BINARY_MASK` and SOFT_MASK are safe aliases in case these underlying types change.

  • The second way to set this property is by using a class prototype of either the separation.masks.binary_mask.BinaryMask or separation.masks.soft_mask.SoftMask class prototype. This is probably the most stable way to set this, and it’s fairly succinct. For example, mask_type = nussl.BinaryMask or mask_type = nussl.SoftMask are both perfectly valid.

Though uncommon, this can be set outside of __init__()

Examples of both methods are shown below.

Returns:mask_type (str) – Either 'soft' or 'binary'.
Raises:ValueError if set invalidly.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import nussl
mixture_signal = nussl.AudioSignal()

# Two options for determining mask upon init...

# Option 1: Init with a string (BINARY_MASK is a string 'constant')
repet_sim = nussl.RepetSim(mixture_signal, mask_type=nussl.MaskSeparationBase.BINARY_MASK)

# Option 2: Init with a class type
ola = nussl.OverlapAdd(mixture_signal, mask_type=nussl.SoftMask)

# It's also possible to change these values after init by changing the `mask_type` property...
repet_sim.mask_type = nussl.MaskSeparationBase.SOFT_MASK  # using a string
ola.mask_type = nussl.BinaryMask  # or using a class type
ones_mask(shape)
Parameters:shape

Returns:

plot(output_file, **kwargs)

Plots relevant data for mask-based separation algorithm. Base class: Do not call directly!

Raises:NotImplementedError – Cannot call base class!
sample_rate

Sample rate of audio_signal. Literally audio_signal.sample_rate.

Type:(int)
stft_params

spectral_utils.StftParams of audio_signal Literally audio_signal.stft_params.

Type:(spectral_utils.StftParams)
to_json()

Outputs JSON from the data stored in this object.

Returns:(str) a JSON string containing all of the information to restore this object exactly as it was when this was called.

See also

from_json() to restore a JSON frozen object.

zeros_mask(shape)

Creates a new zeros mask with this object’s type

Parameters:shape

Returns: