AudioSignal Class

AudioSignal is the main entry and exit point for all source separation algorithms in nussl.

The AudioSignal class is a container for all things related to audio data. It contains utilities for input/output, time-series and frequency domain manipulation, plotting, and much more. The AudioSignal class is used in all source separation objects in nussl.

AudioSignal object stores time-series audio data as a 2D numpy array in audio_data (see audio_data for details) and stores Short-Time Fourier Transform data as 3D numpy array in stft_data (see stft_data for details).

There are a few options for initializing an AudioSignal object. The first is to initialize an empty AudioSignal object, with no parameters:

>>> signal = nussl.AudioSignal()

In this case, there is no data stored in audio_data or in stft_data, though these attributes can be updated at any time after the object has been created.

Additionally, an AudioSignal object can be loaded with exactly one of the following:
  1. A path to an input audio file (see load_audio_from_file() for details).
  2. A numpy array of 1D or 2D real-valued time-series audio data.
  3. A numpy array of 2D or 3D complex-valued time-frequency STFT data.

AudioSignal will throw an error if it is initialized with more than one of the previous at once.

Here are examples of all three of these cases:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
import nussl

# Initializing an empty AudioSignal object:
sig_empty = nussl.AudioSignal()

# Initializing from a path:
file_path = 'my/awesome/mixture.wav'
sig_path = nussl.AudioSignal(file_path)

# Initializing with a 1D or 2D numpy array containing audio data:
aud_1d = np.sin(np.linspace(0.0, 1.0, 48000))
sig_1d = nussl.AudioSignal(audio_data_array=aud_1d, sample_rate=48000)

# FYI: The shape doesn't matter, nussl will correct for it
aud_2d = np.array([aud_1d, -2 * aud_1d])
sig_2d = nussl.AudioSignal(audio_data_array=aud_2d)

# Initializing with a 2D or 3D numpy array containing STFT data:
stft_2d = np.random.rand((1024, 3000)) + 1j * np.random.rand((1024, 3000))
sig_stft_2d = nussl.AudioSignal(stft=stft_2d)

# Two channels of STFT data:
stft_3d = nussl.utils.complex_randn((1024, 3000, 2))
sig_stft_3d = nussl.AudioSignal(stft=stft_3d)

# Initializing with more than one of the above methods will raise an exception:
sig_exception = nussl.AudioSignal(audio_data_array=aud_2d, stft=stft_2d)

When initializing from a path, AudioSignal can read many types of audio files, provided that your computer has the backends installed to understand the corresponding codecs. nussl uses librosa’s load function to read in audio data. See librosa’s documentation for details: https://github.com/librosa/librosa#audioread

The sample rate of an AudioSignal object is set upon initialization. If initializing from a path, the sample rate of the AudioSignal object inherits the native sample rate from the file. If initialized via method 2 or 3 from above, the sample rate is passed in as an optional argument. In these cases, with no sample rate explicitly defined, the default sample rate is 44.1 kHz (CD quality). If this argument is provided when reading from a file and the provided sample rate does not match the native sample rate of the file, AudioSignal will resample the data from the file so that it matches the provided sample rate.

Once initialized with a single type of data (time-series or time-frequency), there are methods to compute an STFT from time-series data (stft()) and vice versa (istft()).

Notes

There is no guarantee that data in audio_data corresponds to data in stft_data. E.g., when an AudioSignal object is initialized with audio_data of an audio mixture, its stft_data is None until stft() is called. Once stft() is called and a mask is applied to stft_data (via some algorithm), the audio_data in this AudioSignal object still contains data from the original mixture that it was initialized with even though stft_data contains altered data. (To hear the results, simply call istft() on the AudioSignal object.) It is up to the user to keep track of the contents of audio_data and stft_data.

See also

For a walk-through of AudioSignal features, see AudioSignal Basics and Spectrograms and STFTs.

class nussl.core.audio_signal.AudioSignal(path_to_input_file=None, audio_data_array=None, stft=None, label=None, sample_rate=None, stft_params=None, offset=0, duration=None)
Parameters:
  • path_to_input_file (str) – Path to an input file to load upon initialization. Audio gets loaded into audio_data.
  • audio_data_array (np.ndarray) – 1D or 2D numpy array containing a real-valued, time-series representation of the audio.
  • stft (np.ndarray) – 2D or 3D numpy array containing pre-computed complex-valued STFT data.
  • label (str) – A label for this AudioSignal object.
  • offset (float) – Starting point of the section to be extracted (in seconds) if initializing from a file.
  • duration (float) – Length of the signal to read from the file (in seconds). Defaults to full length of the signal.
  • sample_rate (int) – Sampling rate of this AudioSignal object.
audio_data

Real-valued, uncompressed, time-domain representation of the audio. 2D numpy array with shape (n_channels, n_samples). None by default. Stored as an array of floats. It is possible to change how much of audio_data is accessible outside of this AudioSignal object by changing the ‘active region’. See set_active_region_to_default() for more details.

Type:np.ndarray
path_to_input_file

Path to the input file. None if this AudioSignal never loaded a file, i.e., initialized with a np.ndarray.

Type:str
sample_rate

Sample rate of this AudioSignal object.

Type:int
stft_data

Complex-valued, frequency-domain representation of audio calculated by stft() or provided upon initialization. 3D numpy array with shape (n_frequency_bins, n_hops, n_channels). None by default.

Type:np.ndarray
stft_params

Container for all settings for doing a STFT. Has same lifespan as AudioSignal object.

Type:StftParams
label

A label for this AudioSignal object.

Type:str
signal_length

PROPERTY

(int): Number of samples in the active region of audio_data. The length of the audio signal represented by this object in samples.

See also

set_active_region_to_default() for information about active regions.

entire_signal_length

PROPERTY

(int): Number of samples in all of audio_data regardless of active regions.

See also

set_active_region_to_default() for information about active regions.

signal_duration

PROPERTY

(float): Duration of the active region of audio_data in seconds. The length of the audio signal represented by this object in seconds.

See also

set_active_region_to_default() for information about active regions.

entire_signal_duration

PROPERTY

(float): Duration of audio in seconds regardless of active regions.

See also

set_active_region_to_default() for information about active regions.

num_channels

PROPERTY

(int): Number of channels this AudioSignal has. Defaults to returning number of channels in audio_data. If that is None, returns number of channels in stft_data. If both are None then returns None.

is_mono

PROPERTY

Returns:(bool) – Whether or not this signal is mono (i.e., has exactly one channel). First looks at audio_data, then (if that’s None) looks at stft_data.
is_stereo

PROPERTY

Returns:(bool) – Whether or not this signal is stereo (i.e., has exactly two channels). First looks at audio_data, then (if that’s None) looks at stft_data.
audio_data

Stored as a np.ndarray, audio_data houses the raw PCM waveform data in the AudioSignal. None by default, can be initialized at instantiation or set at any time by accessing this attribute or calling load_audio_from_array(). It is recommended to set audio_data by using load_audio_from_array() if this AudioSignal has been initialized without any audio or STFT data.

The audio data is stored with shape (n_channels, n_samples) as an array of floats.

See also

initialization.

initialization.

and samples, respectively.

  • stft() to calculate an STFT from this data,

and istft() to calculate the inverse STFT and put it in audio_data.

absolute max value is exactly 1.0.

Notes

  • This attribute only returns values within the active region. For more information

see set_active_region_to_default(). When setting this attribute, the active region are reset to default.

that if one of them is changed, those changes are not instantly reflected in the other. To propagate changes, either call stft() or istft().

  • If audio_data is set with an improperly transposed array, it will

automatically transpose it so that it is set the expected way. A warning will be displayed on the console.

Raises: AudioSignalException if set with anything other than a finite-valued, 2D np.ndarray.

Returns: (np.ndarray)

Real-valued, uncompressed, time-domain representation of the audio. 2D numpy array with shape (n_channels, n_samples). None by default, this can be initialized at instantiation. By default audio data is stored as an array of floats.
stft_data

Stored as a np.ndarray, stft_data houses complex-valued data computed from a Short-time Fourier Transform (STFT) of audio data in the AudioSignal. None by default, this AudioSignal object can be initialized with STFT data upon initialization or it can be set at any time.

The STFT data is stored with shape (n_frequency_bins, n_hops, n_channels) as an array of complex floats.

See also

calculate the inverse STFT from this attribute and put it in audio_data.

  • magnitude_spectrogram to calculate and get the magnitude spectrogram from

stft_data. power_spectrogram to calculate and get the power spectrogram from stft_data.

Notes

that if one of them is changed, those changes are not instantly reflected in the other. To propagate changes, either call stft() or istft().

  • stft_data will expand a two dimensional array so that it has the expected

shape (n_frequency_bins, n_hops, n_channels).

Raises: AudioSignalException if set with an np.ndarray with one dimension or more than three dimensions.

Returns: (np.ndarray)

Complex-valued, time-frequency representation of the audio. 3D numpy array with shape (n_frequency_bins, n_hops, n_channels). None by default.
file_name

PROPERTY

(str): The name of the file wth extension (NOT the full path).

Notes

This will return None if this AudioSignal object was not loaded from a file.

See also

path_to_input_file for the full path.

sample_rate

PROPERTY

Sample rate associated with the audio_data for this AudioSignal object. If audio was read from a file, the sample rate will be set to the sample rate associated with the file. If this AudioSignal object was initialized from an array (either through the constructor or through load_audio_from_array()) then the sample rate is set upon init.

See also

Notes

This property is read-only and cannot be set directly.

Returns:(int) Sample rate for this AudioSignal object. Cannot be changed directly. Can only be set upon initialization or by using resample().
time_vector

PROPERTY

Returns:(np.ndarray) – A 1D np.ndarray with timestamps (in seconds) for each sample in audio_data.
freq_vector

PROPERTY

Raises:AudioSignalException – If stft_data is None. Run stft() before accessing this.
Returns:(np.ndarray) – A 1D numpy array with frequency values that correspond to each frequency bin (vertical axis) for stft_data. Assumes linearly spaced frequency bins.
time_bins_vector

PROPERTY

Raises:AudioSignalException – If stft_data is None. Run stft() before accessing this.
Returns:(np.ndarray) – A 1D numpy array with time values that correspond to each time bin (horizontal/time axis) for stft_data.
stft_length

PROPERTY

Raises:AudioSignalException – If self.stft_dat``a is ``None. Run stft() before accessing this.
Returns:(int) – The length of stft_data along the time axis. In units of hops.
num_fft_bins

PROPERTY

Raises:AudioSignalException – If stft_data is None. Run stft() before accessing his.
Returns:(int) – Number of FFT bins in stft_data
active_region_is_default

PROPERTY

See also

Returns:(bool) – True if active region is the full length of audio_data.
power_spectrogram_data

PROPERTY

(np.ndarray): Returns a real valued np.ndarray with power spectrogram data. The power spectrogram is defined as (STFT)^2, where ^2 is element-wise squaring of entries of the STFT. Same shape as stft_data.

Raises:AudioSignalException – if stft_data is None. Run stft() before accessing this.

See also

magnitude_spectrogram_data

PROPERTY

(np.ndarray): Returns a real valued np.array with magnitude spectrogram data.

The power spectrogram is defined as Abs(STFT), the element-wise absolute value of every item in the STFT. Same shape as stft_data.

Raises:AudioSignalException – if stft_data is None. Run stft() before accessing this.

See also

has_data

PROPERTY

Returns False if audio_data and stft_data are empty. Else, returns True.

Returns:Returns False if audio_data and stft_data are empty. Else, returns True.
has_stft_data

PROPERTY

Returns False if stft_data is empty. Else, returns True.

Returns:Returns False if stft_data is empty. Else, returns True.
has_audio_data

PROPERTY

Returns False if audio_data is empty. Else, returns True.

Returns:Returns False if audio_data is empty. Else, returns True.
load_audio_from_file(input_file_path, offset=0, duration=None, new_sample_rate=None)

Loads an audio signal into memory from a file on disc. The audio is stored in AudioSignal as a np.ndarray of float s. The sample rate is read from the file, and this AudioSignal object’s sample rate is set from it. If :param:`new_sample_rate` is not None nor the same as the sample rate of the file, the audio will be resampled to the sample rate provided in the :param:`new_sample_rate` parameter. After reading the audio data into memory, the active region is set to default.

:param:`offset` and :param:`duration` allow the user to determine how much of the audio is read from the file. If those are non-default, then only the values provided will be stored in audio_data (unlike with the active region, which has the entire audio data stored in memory but only allows access to a subset of the audio).

See also

Parameters:
  • input_file_path (str) – Path to input file.
  • offset (float,) – The starting point of the section to be extracted (seconds). Defaults to 0 seconds (i.e., the very beginning of the file).
  • duration (float) – Length of signal to load in second. signal_length of 0 means read the whole file. Defaults to the full length of the signal.
  • new_sample_rate (int) – If this parameter is not None or the same sample rate as provided by the input file, then the audio data will be resampled to the new sample rate dictated by this parameter.
load_audio_from_array(signal, sample_rate=44100)

Loads an audio signal from a np.ndarray. :param:`sample_rate` is the sample of the signal.

See also

Notes

Only accepts float arrays and int arrays of depth 16-bits.

Parameters:
  • signal (np.ndarray) – Array containing the audio signal sampled at :param:`sample_rate`.
  • sample_rate (int) – The sample rate of signal. Default is constants.DEFAULT_SAMPLE_RATE (44.1kHz)
write_audio_to_file(output_file_path, sample_rate=None, verbose=False)

Outputs the audio signal data in audio_data to a file at :param:`output_file_path` with sample rate of :param:`sample_rate`.

Parameters:
  • output_file_path (str) – Filename where output file will be saved.
  • sample_rate (int) – The sample rate to write the file at. Default is sample_rate.
  • verbose (bool) – Print out a message if writing the file was successful.
set_active_region(start, end)

Determines the bounds of what gets returned when you access audio_data. None of the data in audio_data is discarded when you set the active region, it merely becomes inaccessible until the active region is set back to default (i.e., the full length of the signal).

This is useful for reusing a single AudioSignal object to do multiple operations on only select parts of the audio data.

Warning

Many functions will raise exceptions while the active region is not default. Be aware that adding, subtracting, concatenating, truncating, and other utilities are not available when the active region is not default.

Examples

>>> import nussl
>>> import numpy as np
>>> n = nussl.DEFAULT_SAMPLE_RATE  # 1 second of audio at 44.1kHz
>>> np_sin = np.sin(np.linspace(0, 100 * 2 * np.pi, n))  # sine wave @ 100 Hz
>>> sig = nussl.AudioSignal(audio_data_array=np_sin)
>>> sig.signal_duration
1.0
>>> sig.set_active_region(0, n // 2)
>>> sig.signal_duration
0.5
Parameters:
  • start (int) – Beginning of active region (in samples). Cannot be less than 0.
  • end (int) – End of active region (in samples). Cannot be larger than signal_length.
set_active_region_to_default()

Resets the active region of this AudioSignal object to its default value of the entire audio_data array.

See also

AudioSignal.

next_window_generator(window_size, hop_size, convert_to_samples=False)

Not Implemented

Raises:

NotImplemented

Parameters:
  • window_size
  • hop_size
  • convert_to_samples

Returns:

stft(window_length=None, hop_length=None, window_type=None, n_fft_bins=None, remove_reflection=True, overwrite=True, use_librosa=False)

Computes the Short Time Fourier Transform (STFT) of audio_data. The results of the STFT calculation can be accessed from stft_data if stft_data is None prior to running this function or overwrite == True

Warning

If overwrite=True (default) this will overwrite any data in stft_data!

Parameters:
  • window_length (int) – Amount of time (in samples) to do an FFT on
  • hop_length (int) – Amount of time (in samples) to skip ahead for the new FFT
  • window_type (str) – Type of scaling to apply to the window.
  • n_fft_bins (int) – Number of FFT bins per each hop
  • remove_reflection (bool) – Should remove reflection above Nyquist
  • overwrite (bool) – Overwrite stft_data with current calculation
  • use_librosa (bool) – Use librosa’s stft function
Returns:

(np.ndarray) Calculated, complex-valued STFT from audio_data, 3D numpy array with shape (n_frequency_bins, n_hops, n_channels).

istft(window_length=None, hop_length=None, window_type=None, overwrite=True, use_librosa=False, truncate_to_length=None)

Computes and returns the inverse Short Time Fourier Transform (iSTFT).

The results of the iSTFT calculation can be accessed from audio_data if audio_data is None prior to running this function or overwrite == True

Warning

If overwrite=True (default) this will overwrite any data in audio_data!

Parameters:
  • window_length (int) – Amount of time (in samples) to do an FFT on
  • hop_length (int) – Amount of time (in samples) to skip ahead for the new FFT
  • window_type (str) – Type of scaling to apply to the window.
  • overwrite (bool) – Overwrite stft_data with current calculation
  • use_librosa (bool) – Use librosa’s stft function
  • truncate_to_length (int) – truncate resultant signal to specified length. Default None.
Returns:

(np.ndarray) Calculated, real-valued iSTFT from stft_data, 2D numpy array with shape (n_channels, n_samples).

apply_mask(mask, overwrite=False)

Applies the input mask to the time-frequency representation in this AudioSignal object and returns a new AudioSignal object with the mask applied.

Parameters:
  • mask (MaskBase-derived object) – A MaskBase-derived object containing a mask.
  • overwrite (bool) – If True, this will alter stft_data in self. If False, this function will create a new AudioSignal object with the mask applied.
Returns:

A new AudioSignal object with the input mask applied to the STFT, iff :param:`overwrite` is False.

plot_time_domain(channel=None, x_label_time=True, title=None, file_path_name=None)

Plots a graph of the time domain audio signal.

Parameters:
  • channel (int) – The index of the single channel to be plotted
  • x_label_time (bool) – Label the x axis with time (True) or samples (False)
  • title (str) – The title of the audio signal plot
  • file_path_name (str) – The output path of where the plot is saved, including the file name
plot_spectrogram(file_name=None, ch=None)

Plots the power spectrogram calculated from audio_data.

Parameters:
  • file_name (str) – Path to the output file that will be written.
  • ch (int) – If provided, this function will only make a plot of the given channel.
concat(other)

Concatenate two AudioSignal objects (by concatenating audio_data).

Puts other.audio_data after audio_data.

Raises:AudioSignalException – If self.sample_rate != other.sample_rate, self.num_channels != other.num_channels, or !self.active_region_is_default is False.
Parameters:other (AudioSignal) – AudioSignal to concatenate with the current one.
truncate_samples(n_samples)

Truncates the signal leaving only the first n_samples samples. This can only be done if self.active_region_is_default is True.

Raises:AudioSignalException – If n_samples > self.signal_length or self.active_region_is_default` is False.
Parameters:n_samples – (int) number of samples that will be left.
truncate_seconds(n_seconds)

Truncates the signal leaving only the first n_seconds. This can only be done if self.active_region_is_default is True.

Raises:AudioSignalException – If n_seconds > self.signal_duration or self.active_region_is_default` is False.
Parameters:n_seconds – (float) number of seconds to truncate audio_data.
crop_signal(before, after)

Get rid of samples before and after the signal on all channels. Contracts the length of audio_data by before + after. Useful to get rid of zero padding after the fact.

Parameters:
  • before – (int) number of samples to remove at beginning of self.audio_data
  • after – (int) number of samples to remove at end of self.audio_data
zero_pad(before, after)

Adds zeros before and after the signal to all channels. Extends the length of self.audio_data by before + after.

Raises:

Exception – If self.active_region_is_default` is False.

Parameters:
  • before – (int) number of zeros to be put before the current contents of self.audio_data
  • after – (int) number of zeros to be put after the current contents fo self.audio_data
peak_normalize(overwrite=True)

Normalizes abs(self.audio_data) to 1.0.

Warning

If audio_data is not represented as floats this will convert the representation to floats!

add(other)

Adds two audio signal objects.

This does element-wise addition on the audio_data array.

Raises:AudioSignalException – If self.sample_rate != other.sample_rate, self.num_channels != other.num_channels, or self.active_region_is_default is False.
Parameters:other (AudioSignal) – Other AudioSignal to add.
Returns:(AudioSignal) – New AudioSignal object with the sum of self and other.
subtract(other)

Subtracts two audio signal objects.

This does element-wise subtraction on the audio_data array.

Raises:AudioSignalException – If self.sample_rate != other.sample_rate, self.num_channels != other.num_channels, or self.active_region_is_default is False.
Parameters:other (AudioSignal) – Other AudioSignal to subtract.
Returns:(AudioSignal) – New AudioSignal object with the difference between self and other.
audio_data_as_ints(bit_depth=16)

Returns audio_data as a numpy array of signed ints with a specified bit-depth.

Available bit-depths are: 8-, 16-, 24-, or 32-bits.

Raises:TypeError – If bit_depth is not one of the above bit-depths.

Notes

audio_data is regularly stored as an array of floats. This will not affect audio_data.

Parameters:bit_depth (int) – Bit depth of the integer array that will be returned.
Returns:(np.ndarray) – Integer representation of audio_data.
make_empty_copy(verbose=True)

Makes a copy of this AudioSignal object with audio_data and stft_data initialized to :obj:`np.ndarray`s of the same size, but populated with zeros.

Returns:(AudioSignal) – An AudioSignal object with audio_data and stft_data initialized to ``np.ndarray``s of the same size, but populated with zeros.
make_copy_with_audio_data(audio_data, verbose=True)

Makes a copy of this AudioSignal object with audio_data initialized to the input :param:`audio_data` numpy array. The stft_data of the new AudioSignal object is None.

Parameters:
  • audio_data (np.ndarray) – Audio data to be put into the new AudioSignal object.
  • verbose (bool) – If True prints warnings. If False, outputs nothing.
Returns:

(AudioSignal) – A copy of this AudioSignal object with audio_data initialized to the input :param:`audio_data` numpy array.

make_copy_with_stft_data(stft_data, verbose=True)

Makes a copy of this AudioSignal object with stft_data initialized to the input :param:`stft_data` numpy array. The audio_data of the new AudioSignal object is None.

Parameters:
  • stft_data (np.ndarray) – STFT data to be put into the new AudioSignal object.
  • verbose (bool) – If True prints warnings. If False, outputs nothing.
Returns:

(AudioSignal) – A copy of this AudioSignal object with stft_data initialized to the input :param:`stft_data` numpy array.

to_json()

Converts this AudioSignal object to JSON.

See also

from_json()

Returns:(str) – JSON representation of the current AudioSignal object.
static from_json(json_string)

Creates a new AudioSignal object from a JSON encoded AudioSignal string.

For best results, json_string should be created from AudioSignal.to_json().

See also

to_json()

Parameters:json_string (string) – a json encoded AudioSignal string
Returns:(AudioSignal) – an AudioSignal object based on the parameters in JSON string
rms()

Calculates the root-mean-square of audio_data.

Returns:(float) – Root-mean-square of audio_data.
get_closest_frequency_bin(freq)

Returns index of the closest element to :param:`freq` in the stft_data. Assumes linearly spaced frequency bins.

Parameters:freq (int) – Frequency to retrieve in Hz.
Returns:(int) index of closest frequency to input freq

Example

1
2
3
4
5
# Make a low pass filter starting around 1200 Hz
signal = nussl.AudioSignal('path_to_song.wav')
signal.stft()
idx = signal.get_closest_frequency_bin(1200)  # 1200 Hz
signal.stft_data[idx:, :, :] = 0.0  # eliminate everything above idx
apply_gain(value)

Apply a gain to :attr;`audio_data`

Parameters:value (float) – amount to multiply self.audio_data by
Returns:(AudioSignal) – This AudioSignal object with the gain applied.
resample(new_sample_rate)

Resample the data in audio_data to the new sample rate provided by :param:`new_sample_rate`. If the :param:`new_sample_rate` is the same as sample_rate then nothing happens.

Parameters:new_sample_rate (int) – The new sample rate of audio_data.
get_channel(n)

Gets audio data of n-th channel from audio_data as a 1D np.ndarray of shape (n_samples,).

Parameters:n (int) – index of channel to get. 0-based

See also

stft_data.

Raises:AudioSignalException – If not 0 <= n < self.num_channels.
Returns:(np.array) – The audio data in the n-th channel of the signal, 1D
get_channels()

Generator that will loop through channels of audio_data.

See also

Yields:(np.array) – The audio data in the next channel of this signal as a 1D np.ndarray.
get_stft_channel(n)

Returns STFT data of n-th channel from stft_data as a 2D np.ndarray.

Parameters:n – (int) index of stft channel to get. 0-based

See also

Raises:AudioSignalException – If not 0 <= n < self.num_channels.
Returns:(np.array) – the STFT data in the n-th channel of the signal, 2D
get_stft_channels()

Generator that will loop through channels of stft_data.

See also

Yields:(np.array) – The STFT data in the next channel of this signal as a 2D np.ndarray.
make_audio_signal_from_channel(n)

Makes a new AudioSignal object from with data from channel n.

Parameters:n (int) – index of channel to make a new signal from. 0-based
Returns:(AudioSignal) new AudioSignal object with only data from channel n.
get_power_spectrogram_channel(n)

Returns the n-th channel from self.power_spectrogram_data.

Raises:Exception – If not 0 <= n < self.num_channels.
Parameters:n – (int) index of power spectrogram channel to get 0-based
Returns:(np.array) – the power spectrogram data in the n-th channel of the signal, 1D
get_magnitude_spectrogram_channel(n)

Returns the n-th channel from self.magnitude_spectrogram_data.

Raises:Exception – If not 0 <= n < self.num_channels.
Parameters:n – (int) index of magnitude spectrogram channel to get 0-based
Returns:(np.array) – the magnitude spectrogram data in the n-th channel of the signal, 1D
to_mono(overwrite=False, keep_dims=False)

Converts audio_data to mono by averaging every sample.

Parameters:
  • overwrite (bool) – If True this function will overwrite audio_data.
  • keep_dims (bool) – If False this function will return a 1D array, else will return array with shape (1, n_samples).

Warning

If overwrite=True (default) this will overwrite any data in audio_data!

Returns:(np.array) – Mono-ed version of audio_data.
stft_to_one_channel(overwrite=False)

Converts stft_data to a single channel by averaging every sample. The shape of stft_data will be (num_freq, num_time, 1) (where the last axis is the channel number).

Parameters:overwrite (bool) – If True this function will overwrite stft_data.

Warning

If overwrite=True (default) this will overwrite any data in stft_data!

Returns:(np.array) – Single channel version of stft_data.