datasets.py

While nussl does not come with any data sets, it does have the capability to interface with many common source separation data sets used within the MIR and speech separation communities. These data set “hooks” are implemented as generator functions, and allow the user to loop through each file in the data set and get separation AudioSignal objects for the mixture and individual sources.

nussl.datasets.iKala(directory, check_hash=True, subset=None, shuffle=False, seed=None)

Generator function for the iKala data set. This allows you to loop through the entire data set with only a few AudioSignal objects stored in memory at a time. There are options for only looping through a subset of the data set and shuffling the data set (with a seed). See details about those options below.

nussl calculates the hash of the iKala directory and compares it against a precomputed hash for iKala that ships with nussl. This hash is used to verify that nussl understands the directory structure when reading the files. Calculating the hash can be turned off if the user needs a speed up, but this might cause oblique errors if the iKala directory is not set up in the same way as a fresh download of iKala.

Examples

Using this generator function to loop through the iKala data set. In this example, we use the generator directly in the for loop.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
iKala_path = '/path/to/iKala'  # the iKala directory in disc
for mix, vox, acc in nussl.datasets.iKala(iKala_path):
    mix.to_mono(overwrite=True)  # sum to mono to make a 'mixture'

    # Get some basic metadata on the files.
    # (They'll all have the same file name, but different labels)
    print('Mixture       - Filename: {}, Label: {}'.format(mix.file_name, mix.label))
    print('Vocals        - Filename: {}, Label: {}'.format(vox.file_name, vox.label))
    print('Accompaniment - Filename: {}, Label: {}'.format(acc.file_name, acc.label))

    # Run an algorithm on the iKala files and save to disc
    r = nussl.Repet(mix)
    r.run()
    bg_est, fg_est = r.make_audio_signals()
    bg_est.write_audio_to_file('{}_bg.wav'.format(os.path.splitext(mix.file_name)[0]))
    fg_est.write_audio_to_file('{}_fg.wav'.format(os.path.splitext(mix.file_name)[0]))

It’s also possible to use tqdm to print the progress to the console. This is useful because running through an entire data set can take a while. Here’s a more advanced example using some other options as well:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import nussl
import tdqm

iKala_path = 'path/to/iKala' # the iKala directory on disc
idxs = range(29, 150)[::2]  # Only get every other song between [29, 150)
iKala_gen = nussl.datasets.iKala(iKala_path, subset=idxs, check_hash=False)

# Tell tqdm the number of files we're running on so it can estimate a completion time
for mixture, vocals, accompaniment in tqdm(iKala_gen, total=len(idxs)):
    mix.to_mono(overwrite=True)  # sum to mono to make a 'mixture'

    # Run an algorithm on the iKala files and save to disc
    r = nussl.Repet(mix)
    r.run()
    bg_est, fg_est = r.make_audio_signals()
    bg_est.write_audio_to_file('{}_bg.wav'.format(os.path.splitext(mix.file_name)[0]))
    fg_est.write_audio_to_file('{}_fg.wav'.format(os.path.splitext(mix.file_name)[0]))
Parameters:
  • directory (str) – Top-level directory for the iKala data set.
  • check_hash (bool, str) – In the case that there is a mismatch between the expected and
  • hash, if this parameter is True (calculated) –
  • this parameter is 'warn' (if) –
  • parameter is False, the hash will not be calculated for this directory, i.e., (this) –
  • function does nothing. (this) –
  • subset (float, list, str, None) – This parameter determines how to make a subset of the
  • files in the data set. There are four ways to use it, depending on what type (audio) –
  • parameter takes (this) –
:param 1) If :param:`subset` is a float, then :param:`subset` will return the first: X.Y% of audio files, where X.Y% is some arbitrary percentage. In this case,
:param:`subset` is expected to be in the range [0.0, 1.0].
:param 2) If :param:`subset` is a list, it is expected to be a list of indices (as: ``int``s). This function will then produce the audio files in the list that correspond
to those indices.

:param 3) If :param:`subset` is a str, it will only include audio files with that string: somewhere in the directory name. :param 4) If :param:`subset` is None, then the whole data set is traversed unobstructed.: :param shuffle: Whether the data set should be shuffled. :type shuffle: bool :param seed: Seed for numpy’s random number generator used for :type seed: int, 1-d array_like :param shuffling.:

Yields:(tuple(AudioSignal, AudioSignal, AudioSignal)) – A tuple of three AudioSignal objects, with audio loaded for each source. In the tuple, they are returned in the following order: (mixture, vocals, accompaniment). In iKala, the audio files are such that the vocals are hard panned to one channel and the accompaniment is hard panned to the other. So, the ‘mixture’ yielded here by this function reflects this, and needs to ‘mixed’ down to mono. In other words, mixture is a stereo AudioSignal object, where each channel is on source, and similarly vocals and accompaniment are mono AudioSignal objects made from a single channel in mixture.
nussl.datasets.mir1k(directory, check_hash=True, subset=None, shuffle=False, seed=None, undivided=False)

Generator function for the MIR-1K data set. This allows you to loop through the entire data set with only a few AudioSignal objects stored in memory at a time. There are options for only looping through a subset of the data set and shuffling the data set (with a seed). See details about those options below.

nussl calculates the hash of the MIR-1K directory and compares it against a precomputed hash for MIR-1K that ships with nussl. This hash is used to verify that nussl understands the directory structure when reading the files. Calculating the hash can be turned off if the user needs a speed up, but this might cause oblique errors if the MIR-1K directory is not set up in the same way as a fresh download of MIR-1K.

MIR-1K also ships with two ‘sets’ of audio files: the divided and undivided sets. They contain the same content, the only difference is that the undivided set is one file per song, each song taking up the whole file, and the divided set has the same song divided into segments of ~3-12 seconds. The :param:`undivided` parameter controls which of these two sets nussl will loop through.

Examples

Using this generator function to loop through the MIR-1K data set. In this example, we use the generator directly in the for loop.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
mir1k_path = '/path/to/MIR-1K'  # the MIR-1K directory in disc
for mix, vox, acc in nussl.datasets.mir1k(mir1k_path):
    mix.to_mono(overwrite=True)  # sum to mono to make a 'mixture'

    # Get some basic metadata on the files.
    # (They'll all have the same file name, but different labels)
    print('Mixture       - Filename: {}, Label: {}'.format(mix.file_name, mix.label))
    print('Vocals        - Filename: {}, Label: {}'.format(vox.file_name, vox.label))
    print('Accompaniment - Filename: {}, Label: {}'.format(acc.file_name, acc.label))

    # Run an algorithm on the MIR-1K files and save to disc
    r = nussl.Repet(mix)
    r.run()
    bg_est, fg_est = r.make_audio_signals()
    bg_est.write_audio_to_file('{}_bg.wav'.format(os.path.splitext(mix.file_name)[0]))
    fg_est.write_audio_to_file('{}_fg.wav'.format(os.path.splitext(mix.file_name)[0]))

It’s also possible to use tqdm to print the progress to the console. This is useful because running through an entire data set can take a while. Here’s a more advanced example using some other options as well:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import nussl
import tdqm

mir1k_path = 'path/to/MIR-1K' # the MIR-1K directory on disc
idxs = range(29, 150)[::2]  # Only get every other song between [29, 150)
mir1k_gen = nussl.datasets.mir1k(mir1k_path, subset=idxs,
                                 check_hash=False, undivided=True)

# Tell tqdm the number of files we're running on so it can estimate a completion time
for mixture, vocals, accompaniment in tqdm(mir1k_gen, total=len(idxs)):
    mix.to_mono(overwrite=True)  # sum to mono to make a 'mixture'

    # Run an algorithm on the MIR-1K files and save to disc
    r = nussl.Repet(mix)
    r.run()
    bg_est, fg_est = r.make_audio_signals()
    bg_est.write_audio_to_file('{}_bg.wav'.format(os.path.splitext(mix.file_name)[0]))
    fg_est.write_audio_to_file('{}_fg.wav'.format(os.path.splitext(mix.file_name)[0]))
Parameters:
  • directory (str) – Top-level directory for the MIR-1K data set.
  • check_hash (bool, str) – In the case that there is a mismatch between the expected and
  • hash, if this parameter is True (calculated) –
  • this parameter is 'warn' (if) –
  • parameter is False, the hash will not be calculated for this directory, i.e., (this) –
  • function does nothing. (this) –
  • subset (float, list, str, None) – This parameter determines how to make a subset of the
  • files in the data set. There are four ways to use it, depending on what type (audio) –
  • parameter takes (this) –

:param 1) If :param:`subset` is a float, then :param:`subset` will return the first: :param X.Y% of audio files, where X.Y% is some arbitrary percentage. In this case,: :param :param:`subset` is expected to be in the range [0.0, 1.0].: :param 2) If :param:`subset` is a list, it is expected to be a list of indices (as: :param int``s). This function will then produce the audio files in the list that correspond: :param to those indices.: :param 3) If :param:`subset` is a ``str, it will only include audio files with that string: :param somewhere in the directory name.: :param 4) If :param:`subset` is None, then the whole data set is traversed unobstructed.: :param shuffle: Whether the data set should be shuffled. :type shuffle: bool :param seed: Seed for numpy’s random number generator used for :type seed: int, 1-d array_like :param shuffling.: :param undivided: Whether to use the divided (in the Wavefile directory) or undivided :type undivided: bool :param (in the UndividedWavefile directory).:

Yields:(tuple(AudioSignal, AudioSignal, AudioSignal)) – A tuple of three AudioSignal objects, with audio loaded for each source. In the tuple, they are returned in the following order: (mixture, vocals, accompaniment). In MIR-1K, the audio files are such that the vocals are hard panned to one channel and the accompaniment is hard panned to the other. So, the ‘mixture’ yielded here by this function reflects this, and needs to ‘mixed’ down to mono. In other words, mixture is a stereo AudioSignal object, where each channel is on source, and similarly vocals and accompaniment are mono AudioSignal objects made from a single channel in mixture.
nussl.datasets.timit(directory, check_hash=True, subset=None, shuffle=False, seed=None)

Not implemented yet.

Parameters:
  • directory
  • check_hash
  • subset
  • shuffle
  • seed

Yields:

nussl.datasets.medleyDB(directory, raw=False, check_hash=True, subset=None, shuffle=False, seed=None)

Not implemented yet.

Parameters:
  • directory
  • check_hash
  • subset
  • shuffle
  • seed

Returns:

nussl.datasets.musdb18(directory, check_hash=True, subset=None, folder=None, shuffle=False, seed=None)

Generator function for the MUSDB18 data set. This allows you to loop through the entire data set with only a few AudioSignal objects stored in memory at a time. There are options for only looping through a subset of the data set and shuffling the data set (with a seed). See details about those options below.

nussl calculates the hash of the MUSDB18 directory and compares it against a precomputed hash for MUSDB18 that ships with nussl. This hash is used to verify that nussl understands the directory structure when reading the files. Calculating the hash can be turned off if the user needs a speed up, but this might cause oblique errors if the MUSDB directory is not set up in the same way as a fresh download of MUSDB18.

The audio in MUSDB18 is stored in the ‘stempeg’ format from Native Instruments. nussl uses the stempeg library to read these files from disc, and returns each of the sources as individual AudioSignal objects.

Examples

Using this generator function to loop through the MUSDB18 data set. In this example, we use the generator directly in the for loop.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
musdb_path = '/path/to/MUSDB18'  # the MUSDB18 directory in disc
for mix, drums, bass, other, vox in nussl.datasets.musdb(musdb_path):

    # Get some basic metadata on the files.
    # (They'll all have the same file name, but different labels)
    print('Mixture  - Filename: {}, Label: {}'.format(mix.file_name, mix.label))
    print('Vocals   - Filename: {}, Label: {}'.format(vox.file_name, vox.label))
    print('Drums    - Filename: {}, Label: {}'.format(drums.file_name, drums.label))

    # Run an algorithm on the MUSDB18 files and save to disc
    r = nussl.Repet(mix)
    r.run()
    bg_est, fg_est = r.make_audio_signals()
    bg_est.write_audio_to_file('{}_bg.wav'.format(os.path.splitext(mix.file_name)[0]))
    fg_est.write_audio_to_file('{}_fg.wav'.format(os.path.splitext(mix.file_name)[0]))

It’s also possible to use tqdm to print the progress to the console. This is useful because running through an entire data set can take a while. Here’s a more advanced example using some other options as well:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import nussl
import tdqm

musdb_path = 'path/to/MUSDB18' # the MUSDB18 directory on disc

# Only run on the 'test' folder (this has 50 songs)
musdb_gen = nussl.datasets.musdb(musdb_path, subset='test', check_hash=False)

# Tell tqdm the number of files we're running on so it can estimate a completion time
for mix, drums, bass, other, vox in tqdm(musdb_gen, total=50):

    # Run an algorithm on the MUSDB18 files and save to disc
    r = nussl.Repet(mix)
    r.run()
    bg_est, fg_est = r.make_audio_signals()
    bg_est.write_audio_to_file('{}_bg.wav'.format(os.path.splitext(mix.file_name)[0]))
    fg_est.write_audio_to_file('{}_fg.wav'.format(os.path.splitext(mix.file_name)[0]))
Parameters:
  • directory (str) – Top-level directory for the MUSDB18 data set.
  • check_hash (bool, str) – In the case that there is a mismatch between the expected and calculated hash, if this parameter is True (a bool) an exception is raised and if this parameter is 'warn' (a string) a warning is printed to the console. If this parameter is False, the hash will not be calculated for this directory, i.e., this function does nothing.
  • subset (float, list, str, None) – This parameter determines how to make a subset of the
  • files in the data set. There are four ways to use it, depending on what type (audio) –
  • parameter takes (this) –

:param * If :param:`subset` is a float, then :param:`subset` will return the first: :param X.Y% of audio files, where X.Y% is some arbitrary percentage. In this case,: :param :param:`subset` is expected to be in the range [0.0, 1.0].: :param ( If :param:`subset` is a list, it is expected to be a list of indices (as: :param int``s). This function will then produce the audio files in the list that correspond: :param to those indices.: :param ( If :param:`subset` is a ``str, it will only include audio files with that string: :param somewhere in the directory name.: :param * If :param:`subset` is None, then the whole data set is traversed unobstructed.: :param shuffle: Whether the data set should be shuffled. :type shuffle: bool :param seed: Seed for numpy’s random number generator used for :type seed: int, 1-d array_like :param shuffling.:

Yields:(tuple) – A tuple of five AudioSignal objects, with audio loaded for each source. In the tuple, they are returned in the following order: (mixture, drums, bass, other, vox).
nussl.datasets.dsd100(directory, check_hash=True, subset=None, shuffle=False, seed=None)

Not implemented yet.

Parameters:
  • directory
  • check_hash
  • subset
  • shuffle
  • seed

Returns: