AudioSignal Class¶
AudioSignal
is the main entry and exit point for all source separation algorithms
in nussl.
The AudioSignal
class is a container for all things related to audio data. It contains
utilities for input/output, time-series and frequency domain manipulation, plotting, and much
more. The AudioSignal
class is used in all source separation objects in nussl.
AudioSignal
object stores time-series audio data as a 2D numpy array in
audio_data
(see audio_data
for details) and stores Short-Time Fourier Transform
data as 3D numpy array in stft_data (see stft_data
for details).
There are a few options for initializing an AudioSignal object. The first is to initialize an empty AudioSignal object, with no parameters:
>>> signal = nussl.AudioSignal()
In this case, there is no data stored in audio_data
or in stft_data
, though
these attributes can be updated at any time after the object has been created.
- Additionally, an AudioSignal object can be loaded with exactly one of the following:
- A path to an input audio file (see
load_audio_from_file()
for details). - A numpy array of 1D or 2D real-valued time-series audio data.
- A numpy array of 2D or 3D complex-valued time-frequency STFT data.
- A path to an input audio file (see
AudioSignal
will throw an error if it is initialized with more than one of the
previous at once.
Here are examples of all three of these cases:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 import numpy as np import nussl # Initializing an empty AudioSignal object: sig_empty = nussl.AudioSignal() # Initializing from a path: file_path = 'my/awesome/mixture.wav' sig_path = nussl.AudioSignal(file_path) # Initializing with a 1D or 2D numpy array containing audio data: aud_1d = np.sin(np.linspace(0.0, 1.0, 48000)) sig_1d = nussl.AudioSignal(audio_data_array=aud_1d, sample_rate=48000) # FYI: The shape doesn't matter, nussl will correct for it aud_2d = np.array([aud_1d, -2 * aud_1d]) sig_2d = nussl.AudioSignal(audio_data_array=aud_2d) # Initializing with a 2D or 3D numpy array containing STFT data: stft_2d = np.random.rand((1024, 3000)) + 1j * np.random.rand((1024, 3000)) sig_stft_2d = nussl.AudioSignal(stft=stft_2d) # Two channels of STFT data: stft_3d = nussl.utils.complex_randn((1024, 3000, 2)) sig_stft_3d = nussl.AudioSignal(stft=stft_3d) # Initializing with more than one of the above methods will raise an exception: sig_exception = nussl.AudioSignal(audio_data_array=aud_2d, stft=stft_2d)
When initializing from a path, AudioSignal can read many types of audio files, provided that your computer has the backends installed to understand the corresponding codecs. nussl uses librosa’s load function to read in audio data. See librosa’s documentation for details: https://github.com/librosa/librosa#audioread
The sample rate of an AudioSignal object is set upon initialization. If initializing from a path, the sample rate of the AudioSignal object inherits the native sample rate from the file. If initialized via method 2 or 3 from above, the sample rate is passed in as an optional argument. In these cases, with no sample rate explicitly defined, the default sample rate is 44.1 kHz (CD quality). If this argument is provided when reading from a file and the provided sample rate does not match the native sample rate of the file, AudioSignal will resample the data from the file so that it matches the provided sample rate.
Once initialized with a single type of data (time-series or time-frequency), there are methods
to compute an STFT from time-series data (stft()
) and vice versa (istft()
).
Notes
There is no guarantee that data in audio_data
corresponds to data in
stft_data
. E.g., when an AudioSignal
object is initialized with
audio_data
of an audio mixture, its stft_data
is None until stft()
is called. Once stft()
is called and a mask is applied to stft_data
(via some
algorithm), the audio_data
in this AudioSignal
object still contains data
from the original mixture that it was initialized with even though stft_data
contains altered data. (To hear the results, simply call istft()
on the
AudioSignal
object.) It is up to the user to keep track of the contents of
audio_data
and stft_data
.
See also
For a walk-through of AudioSignal features, see AudioSignal Basics and Spectrograms and STFTs.
-
class
nussl.core.audio_signal.
AudioSignal
(path_to_input_file=None, audio_data_array=None, stft=None, label=None, sample_rate=None, stft_params=None, offset=0, duration=None)¶ Parameters: - path_to_input_file (str) – Path to an input file to load upon initialization. Audio
gets loaded into
audio_data
. - audio_data_array (
np.ndarray
) – 1D or 2D numpy array containing a real-valued, time-series representation of the audio. - stft (
np.ndarray
) – 2D or 3D numpy array containing pre-computed complex-valued STFT data. - label (str) – A label for this
AudioSignal
object. - offset (float) – Starting point of the section to be extracted (in seconds) if initializing from a file.
- duration (float) – Length of the signal to read from the file (in seconds). Defaults to full length of the signal.
- sample_rate (int) – Sampling rate of this
AudioSignal
object.
-
audio_data
¶ Real-valued, uncompressed, time-domain representation of the audio. 2D numpy array with shape (n_channels, n_samples).
None
by default. Stored as an array of floats. It is possible to change how much ofaudio_data
is accessible outside of thisAudioSignal
object by changing the ‘active region’. Seeset_active_region_to_default()
for more details.Type: np.ndarray
-
path_to_input_file
¶ Path to the input file.
None
if this AudioSignal never loaded a file, i.e., initialized with a np.ndarray.Type: str
-
sample_rate
¶ Sample rate of this
AudioSignal
object.Type: int
-
stft_data
¶ Complex-valued, frequency-domain representation of audio calculated by
stft()
or provided upon initialization. 3Dnumpy
array with shape (n_frequency_bins, n_hops, n_channels).None
by default.Type: np.ndarray
-
stft_params
¶ Container for all settings for doing a STFT. Has same lifespan as
AudioSignal
object.Type: StftParams
-
label
¶ A label for this
AudioSignal
object.Type: str
-
signal_length
¶ PROPERTY
(int): Number of samples in the active region of
audio_data
. The length of the audio signal represented by this object in samples.See also
set_active_region_to_default()
for information about active regions.
-
entire_signal_length
¶ PROPERTY
(int): Number of samples in all of
audio_data
regardless of active regions.See also
set_active_region_to_default()
for information about active regions.
-
signal_duration
¶ PROPERTY
(float): Duration of the active region of
audio_data
in seconds. The length of the audio signal represented by this object in seconds.See also
set_active_region_to_default()
for information about active regions.
-
entire_signal_duration
¶ PROPERTY
(float): Duration of audio in seconds regardless of active regions.
See also
set_active_region_to_default()
for information about active regions.
-
num_channels
¶ PROPERTY
(int): Number of channels this AudioSignal has. Defaults to returning number of channels in
audio_data
. If that isNone
, returns number of channels instft_data
. If both areNone
then returnsNone
.See also
-
is_mono
¶ PROPERTY
Returns: (bool) – Whether or not this signal is mono (i.e., has exactly one channel). First looks at audio_data
, then (if that’s None) looks atstft_data
.See also
-
is_stereo
¶ PROPERTY
Returns: (bool) – Whether or not this signal is stereo (i.e., has exactly two channels). First looks at audio_data
, then (if that’s None) looks atstft_data
.See also
-
audio_data
Stored as a
np.ndarray
,audio_data
houses the raw PCM waveform data in theAudioSignal
.None
by default, can be initialized at instantiation or set at any time by accessing this attribute or callingload_audio_from_array()
. It is recommended to setaudio_data
by usingload_audio_from_array()
if thisAudioSignal
has been initialized without any audio or STFT data.The audio data is stored with shape (n_channels, n_samples) as an array of floats.
See also
load_audio_from_file()
to load audio intoaudio_data
after
initialization.
load_audio_from_array()
to safely load autio intoaudio_data
after
initialization.
set_active_region_to_default()
for more information about the active region.signal_duration
andsignal_length
for length of audio data in seconds
and samples, respectively.
stft()
to calculate an STFT from this data,
and
istft()
to calculate the inverse STFT and put it inaudio_data
.has_audio_data
to check if this attribute is empty or not.plot_time_domain()
to create a plot of audio data stored in this attribute.peak_normalize()
to apply gain such that to the
absolute max value is exactly
1.0
.rms()
to calculate the root-mean-square ofaudio_data
apply_gain()
to apply a gain.get_channel()
to safely retrieve a single channel inaudio_data
.
Notes
- This attribute only returns values within the active region. For more information
see
set_active_region_to_default()
. When setting this attribute, the active region are reset to default.audio_data
andstft_data
are not automatically synchronized, meaning
that if one of them is changed, those changes are not instantly reflected in the other. To propagate changes, either call
stft()
oristft()
.- If
audio_data
is set with an improperly transposed array, it will
automatically transpose it so that it is set the expected way. A warning will be displayed on the console.
Raises:
AudioSignalException
if set with anything other than a finite-valued, 2Dnp.ndarray
.Returns: (
np.ndarray
)Real-valued, uncompressed, time-domain representation of the audio. 2Dnumpy
array with shape (n_channels, n_samples).None
by default, this can be initialized at instantiation. By default audio data is stored as an array of floats.
-
stft_data
Stored as a
np.ndarray
,stft_data
houses complex-valued data computed from a Short-time Fourier Transform (STFT) of audio data in theAudioSignal
.None
by default, thisAudioSignal
object can be initialized with STFT data upon initialization or it can be set at any time.The STFT data is stored with shape (n_frequency_bins, n_hops, n_channels) as an array of complex floats.
See also
stft()
to calculate an STFT fromaudio_data
, andistft()
to
calculate the inverse STFT from this attribute and put it in
audio_data
.magnitude_spectrogram
to calculate and get the magnitude spectrogram from
stft_data
.power_spectrogram
to calculate and get the power spectrogram fromstft_data
.get_stft_channel()
to safely get a specific channel instft_data
.
Notes
audio_data
andstft_data
are not automatically synchronized, meaning
that if one of them is changed, those changes are not instantly reflected in the other. To propagate changes, either call
stft()
oristft()
.stft_data
will expand a two dimensional array so that it has the expected
shape (n_frequency_bins, n_hops, n_channels).
Raises:
AudioSignalException
if set with annp.ndarray
with one dimension or more than three dimensions.Returns: (
np.ndarray
)Complex-valued, time-frequency representation of the audio. 3D numpy array with shape (n_frequency_bins, n_hops, n_channels).None
by default.
-
file_name
¶ PROPERTY
(str): The name of the file wth extension (NOT the full path).
Notes
This will return
None
if thisAudioSignal
object was not loaded from a file.See also
path_to_input_file
for the full path.
-
sample_rate
PROPERTY
Sample rate associated with the
audio_data
for thisAudioSignal
object. If audio was read from a file, the sample rate will be set to the sample rate associated with the file. If thisAudioSignal
object was initialized from an array (either through the constructor or throughload_audio_from_array()
) then the sample rate is set upon init.See also
resample()
to change the sample rate and resample data insample_rate
.load_audio_from_array()
to read audio from an array and set the sample rate.- :var:`nussl.constants.DEFAULT_SAMPLE_RATE` the default sample rate for nussl
- if not specified
Notes
This property is read-only and cannot be set directly.
Returns: (int) Sample rate for this AudioSignal
object. Cannot be changed directly. Can only be set upon initialization or by usingresample()
.
-
time_vector
¶ PROPERTY
Returns: ( np.ndarray
) – A 1Dnp.ndarray
with timestamps (in seconds) for each sample inaudio_data
.
-
freq_vector
¶ PROPERTY
Raises: AudioSignalException
– Ifstft_data
isNone
. Runstft()
before accessing this.Returns: ( np.ndarray
) – A 1D numpy array with frequency values that correspond to each frequency bin (vertical axis) forstft_data
. Assumes linearly spaced frequency bins.
-
time_bins_vector
¶ PROPERTY
Raises: AudioSignalException
– Ifstft_data
isNone
. Runstft()
before accessing this.Returns: ( np.ndarray
) – A 1D numpy array with time values that correspond to each time bin (horizontal/time axis) forstft_data
.
-
stft_length
¶ PROPERTY
Raises: AudioSignalException
– Ifself.stft_dat``a is ``None
. Runstft()
before accessing this.Returns: (int) – The length of stft_data
along the time axis. In units of hops.
-
num_fft_bins
¶ PROPERTY
Raises: AudioSignalException
– Ifstft_data
isNone
. Runstft()
before accessing his.Returns: (int) – Number of FFT bins in stft_data
-
active_region_is_default
¶ PROPERTY
See also
set_active_region()
for a description of active regions inAudioSignal
set_active_region_to_default()
Returns: (bool) – True if active region is the full length of audio_data
.
-
power_spectrogram_data
¶ PROPERTY
(
np.ndarray
): Returns a real valuednp.ndarray
with power spectrogram data. The power spectrogram is defined as (STFT)^2, where ^2 is element-wise squaring of entries of the STFT. Same shape asstft_data
.Raises: AudioSignalException
– ifstft_data
isNone
. Runstft()
before accessing this.See also
stft()
to calculate the STFT before accessing this attribute.stft_data
complex-valued Short-time Fourier Transform data.power_magnitude_data
.get_power_spectrogram_channel()
.
-
magnitude_spectrogram_data
¶ PROPERTY
(
np.ndarray
): Returns a real valuednp.array
with magnitude spectrogram data.The power spectrogram is defined as Abs(STFT), the element-wise absolute value of every item in the STFT. Same shape as
stft_data
.Raises: AudioSignalException
– ifstft_data
isNone
. Runstft()
before accessing this.See also
stft()
to calculate the STFT before accessing this attribute.stft_data
complex-valued Short-time Fourier Transform data.power_spectrogram_data
get_magnitude_spectrogram_channel()
-
has_data
¶ PROPERTY
Returns False if
audio_data
andstft_data
are empty. Else, returns True.Returns: Returns False if audio_data
andstft_data
are empty. Else, returns True.
-
has_stft_data
¶ PROPERTY
Returns False if
stft_data
is empty. Else, returns True.Returns: Returns False if stft_data
is empty. Else, returns True.
-
has_audio_data
¶ PROPERTY
Returns False if
audio_data
is empty. Else, returns True.Returns: Returns False if audio_data
is empty. Else, returns True.
-
load_audio_from_file
(input_file_path, offset=0, duration=None, new_sample_rate=None)¶ Loads an audio signal into memory from a file on disc. The audio is stored in
AudioSignal
as anp.ndarray
of float s. The sample rate is read from the file, and thisAudioSignal
object’s sample rate is set from it. If :param:`new_sample_rate` is notNone
nor the same as the sample rate of the file, the audio will be resampled to the sample rate provided in the :param:`new_sample_rate` parameter. After reading the audio data into memory, the active region is set to default.:param:`offset` and :param:`duration` allow the user to determine how much of the audio is read from the file. If those are non-default, then only the values provided will be stored in
audio_data
(unlike with the active region, which has the entire audio data stored in memory but only allows access to a subset of the audio).See also
load_audio_from_array()
to read audio data from anp.ndarray
.
Parameters: - input_file_path (str) – Path to input file.
- offset (float,) – The starting point of the section to be extracted (seconds). Defaults to 0 seconds (i.e., the very beginning of the file).
- duration (float) – Length of signal to load in second. signal_length of 0 means read the whole file. Defaults to the full length of the signal.
- new_sample_rate (int) – If this parameter is not
None
or the same sample rate as provided by the input file, then the audio data will be resampled to the new sample rate dictated by this parameter.
-
load_audio_from_array
(signal, sample_rate=44100)¶ Loads an audio signal from a
np.ndarray
. :param:`sample_rate` is the sample of the signal.See also
load_audio_from_file()
to read in an audio file from disc.
Notes
Only accepts float arrays and int arrays of depth 16-bits.
Parameters: - signal (
np.ndarray
) – Array containing the audio signal sampled at :param:`sample_rate`. - sample_rate (int) – The sample rate of signal. Default is constants.DEFAULT_SAMPLE_RATE (44.1kHz)
-
write_audio_to_file
(output_file_path, sample_rate=None, verbose=False)¶ Outputs the audio signal data in
audio_data
to a file at :param:`output_file_path` with sample rate of :param:`sample_rate`.Parameters: - output_file_path (str) – Filename where output file will be saved.
- sample_rate (int) – The sample rate to write the file at. Default is
sample_rate
. - verbose (bool) – Print out a message if writing the file was successful.
-
set_active_region
(start, end)¶ Determines the bounds of what gets returned when you access
audio_data
. None of the data inaudio_data
is discarded when you set the active region, it merely becomes inaccessible until the active region is set back to default (i.e., the full length of the signal).This is useful for reusing a single
AudioSignal
object to do multiple operations on only select parts of the audio data.Warning
Many functions will raise exceptions while the active region is not default. Be aware that adding, subtracting, concatenating, truncating, and other utilities are not available when the active region is not default.
Examples
>>> import nussl >>> import numpy as np >>> n = nussl.DEFAULT_SAMPLE_RATE # 1 second of audio at 44.1kHz >>> np_sin = np.sin(np.linspace(0, 100 * 2 * np.pi, n)) # sine wave @ 100 Hz >>> sig = nussl.AudioSignal(audio_data_array=np_sin) >>> sig.signal_duration 1.0 >>> sig.set_active_region(0, n // 2) >>> sig.signal_duration 0.5
Parameters: - start (int) – Beginning of active region (in samples). Cannot be less than 0.
- end (int) – End of active region (in samples). Cannot be larger than
signal_length
.
-
set_active_region_to_default
()¶ Resets the active region of this
AudioSignal
object to its default value of the entireaudio_data
array.
-
next_window_generator
(window_size, hop_size, convert_to_samples=False)¶ Not Implemented
Raises: NotImplemented
Parameters: - window_size –
- hop_size –
- convert_to_samples –
Returns:
-
stft
(window_length=None, hop_length=None, window_type=None, n_fft_bins=None, remove_reflection=True, overwrite=True, use_librosa=False)¶ Computes the Short Time Fourier Transform (STFT) of
audio_data
. The results of the STFT calculation can be accessed fromstft_data
ifstft_data
isNone
prior to running this function oroverwrite == True
Warning
If overwrite=True (default) this will overwrite any data in
stft_data
!Parameters: - window_length (int) – Amount of time (in samples) to do an FFT on
- hop_length (int) – Amount of time (in samples) to skip ahead for the new FFT
- window_type (str) – Type of scaling to apply to the window.
- n_fft_bins (int) – Number of FFT bins per each hop
- remove_reflection (bool) – Should remove reflection above Nyquist
- overwrite (bool) – Overwrite
stft_data
with current calculation - use_librosa (bool) – Use librosa’s stft function
Returns: (
np.ndarray
) Calculated, complex-valued STFT fromaudio_data
, 3D numpy array with shape (n_frequency_bins, n_hops, n_channels).
-
istft
(window_length=None, hop_length=None, window_type=None, overwrite=True, use_librosa=False, truncate_to_length=None)¶ Computes and returns the inverse Short Time Fourier Transform (iSTFT).
The results of the iSTFT calculation can be accessed from
audio_data
ifaudio_data
isNone
prior to running this function oroverwrite == True
Warning
If overwrite=True (default) this will overwrite any data in
audio_data
!Parameters: - window_length (int) – Amount of time (in samples) to do an FFT on
- hop_length (int) – Amount of time (in samples) to skip ahead for the new FFT
- window_type (str) – Type of scaling to apply to the window.
- overwrite (bool) – Overwrite
stft_data
with current calculation - use_librosa (bool) – Use librosa’s stft function
- truncate_to_length (int) – truncate resultant signal to specified length. Default None.
Returns: (
np.ndarray
) Calculated, real-valued iSTFT fromstft_data
, 2D numpy array with shape (n_channels, n_samples).
-
apply_mask
(mask, overwrite=False)¶ Applies the input mask to the time-frequency representation in this AudioSignal object and returns a new AudioSignal object with the mask applied.
Parameters: - mask (
MaskBase
-derived object) – A MaskBase-derived object containing a mask. - overwrite (bool) – If
True
, this will alter stft_data in self. IfFalse
, this function will create a new AudioSignal object with the mask applied.
Returns: A new
AudioSignal
object with the input mask applied to the STFT, iff :param:`overwrite` is False.- mask (
-
plot_time_domain
(channel=None, x_label_time=True, title=None, file_path_name=None)¶ Plots a graph of the time domain audio signal.
Parameters: - channel (int) – The index of the single channel to be plotted
- x_label_time (bool) – Label the x axis with time (True) or samples (False)
- title (str) – The title of the audio signal plot
- file_path_name (str) – The output path of where the plot is saved, including the file name
-
plot_spectrogram
(file_name=None, ch=None)¶ Plots the power spectrogram calculated from
audio_data
.Parameters: - file_name (str) – Path to the output file that will be written.
- ch (int) – If provided, this function will only make a plot of the given channel.
-
concat
(other)¶ Concatenate two
AudioSignal
objects (by concatenatingaudio_data
).Puts
other.audio_data
afteraudio_data
.Raises: AudioSignalException
– Ifself.sample_rate != other.sample_rate
,self.num_channels != other.num_channels
, or!self.active_region_is_default
isFalse
.Parameters: other ( AudioSignal
) –AudioSignal
to concatenate with the current one.
-
truncate_samples
(n_samples)¶ Truncates the signal leaving only the first
n_samples
samples. This can only be done ifself.active_region_is_default
is True.Raises: AudioSignalException
– Ifn_samples > self.signal_length
or self.active_region_is_default` isFalse
.Parameters: n_samples – (int) number of samples that will be left.
-
truncate_seconds
(n_seconds)¶ Truncates the signal leaving only the first n_seconds. This can only be done if self.active_region_is_default is True.
Raises: AudioSignalException
– Ifn_seconds > self.signal_duration
or self.active_region_is_default` isFalse
.Parameters: n_seconds – (float) number of seconds to truncate audio_data
.
-
crop_signal
(before, after)¶ Get rid of samples before and after the signal on all channels. Contracts the length of
audio_data
by before + after. Useful to get rid of zero padding after the fact.Parameters: - before – (int) number of samples to remove at beginning of self.audio_data
- after – (int) number of samples to remove at end of self.audio_data
-
zero_pad
(before, after)¶ Adds zeros before and after the signal to all channels. Extends the length of self.audio_data by before + after.
Raises: Exception
– If self.active_region_is_default` isFalse
.Parameters: - before – (int) number of zeros to be put before the current contents of self.audio_data
- after – (int) number of zeros to be put after the current contents fo self.audio_data
-
peak_normalize
(overwrite=True)¶ Normalizes
abs(self.audio_data)
to 1.0.Warning
If
audio_data
is not represented as floats this will convert the representation to floats!
-
add
(other)¶ Adds two audio signal objects.
This does element-wise addition on the
audio_data
array.Raises: AudioSignalException
– Ifself.sample_rate != other.sample_rate
,self.num_channels != other.num_channels
, orself.active_region_is_default
isFalse
.Parameters: other ( AudioSignal
) – OtherAudioSignal
to add.Returns: ( AudioSignal
) – NewAudioSignal
object with the sum ofself
andother
.
-
subtract
(other)¶ Subtracts two audio signal objects.
This does element-wise subtraction on the
audio_data
array.Raises: AudioSignalException
– Ifself.sample_rate != other.sample_rate
,self.num_channels != other.num_channels
, orself.active_region_is_default
isFalse
.Parameters: other ( AudioSignal
) – OtherAudioSignal
to subtract.Returns: ( AudioSignal
) – NewAudioSignal
object with the difference betweenself
andother
.
-
audio_data_as_ints
(bit_depth=16)¶ Returns
audio_data
as a numpy array of signed ints with a specified bit-depth.Available bit-depths are: 8-, 16-, 24-, or 32-bits.
Raises: TypeError
– Ifbit_depth
is not one of the above bit-depths.Notes
audio_data
is regularly stored as an array of floats. This will not affectaudio_data
.Parameters: bit_depth (int) – Bit depth of the integer array that will be returned. Returns: ( np.ndarray
) – Integer representation ofaudio_data
.
-
make_empty_copy
(verbose=True)¶ Makes a copy of this
AudioSignal
object withaudio_data
andstft_data
initialized to :obj:`np.ndarray`s of the same size, but populated with zeros.Returns: ( AudioSignal
) – AnAudioSignal
object withaudio_data
andstft_data
initialized to ``np.ndarray``s of the same size, but populated with zeros.
-
make_copy_with_audio_data
(audio_data, verbose=True)¶ Makes a copy of this AudioSignal object with
audio_data
initialized to the input :param:`audio_data` numpy array. Thestft_data
of the new AudioSignal object is None.Parameters: - audio_data (
np.ndarray
) – Audio data to be put into the new AudioSignal object. - verbose (bool) – If
True
prints warnings. IfFalse
, outputs nothing.
Returns: (
AudioSignal
) – A copy of this AudioSignal object withaudio_data
initialized to the input :param:`audio_data` numpy array.- audio_data (
-
make_copy_with_stft_data
(stft_data, verbose=True)¶ Makes a copy of this AudioSignal object with
stft_data
initialized to the input :param:`stft_data` numpy array. Theaudio_data
of the new AudioSignal object is None.Parameters: - stft_data (
np.ndarray
) – STFT data to be put into the new AudioSignal object. - verbose (bool) – If
True
prints warnings. IfFalse
, outputs nothing.
Returns: (
AudioSignal
) – A copy of this AudioSignal object withstft_data
initialized to the input :param:`stft_data` numpy array.- stft_data (
-
to_json
()¶ Converts this
AudioSignal
object to JSON.See also
Returns: (str) – JSON representation of the current AudioSignal
object.
-
static
from_json
(json_string)¶ Creates a new
AudioSignal
object from a JSON encodedAudioSignal
string.For best results,
json_string
should be created fromAudioSignal.to_json()
.See also
Parameters: json_string (string) – a json encoded AudioSignal
stringReturns: ( AudioSignal
) – anAudioSignal
object based on the parameters in JSON string
-
rms
()¶ Calculates the root-mean-square of
audio_data
.Returns: (float) – Root-mean-square of audio_data
.
-
get_closest_frequency_bin
(freq)¶ Returns index of the closest element to :param:`freq` in the
stft_data
. Assumes linearly spaced frequency bins.Parameters: freq (int) – Frequency to retrieve in Hz. Returns: (int) index of closest frequency to input freq Example
1 2 3 4 5
# Make a low pass filter starting around 1200 Hz signal = nussl.AudioSignal('path_to_song.wav') signal.stft() idx = signal.get_closest_frequency_bin(1200) # 1200 Hz signal.stft_data[idx:, :, :] = 0.0 # eliminate everything above idx
-
apply_gain
(value)¶ Apply a gain to :attr;`audio_data`
Parameters: value (float) – amount to multiply self.audio_data by Returns: ( AudioSignal
) – ThisAudioSignal
object with the gain applied.
-
resample
(new_sample_rate)¶ Resample the data in
audio_data
to the new sample rate provided by :param:`new_sample_rate`. If the :param:`new_sample_rate` is the same assample_rate
then nothing happens.Parameters: new_sample_rate (int) – The new sample rate of audio_data
.
-
get_channel
(n)¶ Gets audio data of n-th channel from
audio_data
as a 1Dnp.ndarray
of shape(n_samples,)
.Parameters: n (int) – index of channel to get. 0-based See also
get_channels()
: Generator for looping through channels ofaudio_data
.get_stft_channel()
: Gets stft data from a specific channel.get_stft_channels()
: Generator for looping through channels from
Raises: AudioSignalException
– If not0 <= n < self.num_channels
.Returns: ( np.array
) – The audio data in the n-th channel of the signal, 1D
-
get_channels
()¶ Generator that will loop through channels of
audio_data
.See also
get_channel()
: Gets audio data from a specific channel.get_stft_channel()
: Gets stft data from a specific channel.get_stft_channels()
: Generator to loop through channels ofstft_data
.
Yields: ( np.array
) – The audio data in the next channel of this signal as a 1Dnp.ndarray
.
-
get_stft_channel
(n)¶ Returns STFT data of n-th channel from
stft_data
as a 2Dnp.ndarray
.Parameters: n – (int) index of stft channel to get. 0-based See also
get_stft_channels()
: Generator to loop through channels fromstft_data
.get_channel()
: Gets audio data from a specific channel.get_channels()
: Generator to loop through channels ofaudio_data
.
Raises: AudioSignalException
– If not0 <= n < self.num_channels
.Returns: ( np.array
) – the STFT data in the n-th channel of the signal, 2D
-
get_stft_channels
()¶ Generator that will loop through channels of
stft_data
.See also
get_stft_channel()
: Gets stft data from a specific channel.get_channel()
: Gets audio data from a specific channel.get_channels()
: Generator to loop through channels ofaudio_data
.
Yields: ( np.array
) – The STFT data in the next channel of this signal as a 2Dnp.ndarray
.
-
make_audio_signal_from_channel
(n)¶ Makes a new
AudioSignal
object from with data from channeln
.Parameters: n (int) – index of channel to make a new signal from. 0-based Returns: ( AudioSignal
) newAudioSignal
object with only data from channeln
.
-
get_power_spectrogram_channel
(n)¶ Returns the n-th channel from
self.power_spectrogram_data
.Raises: Exception
– If not0 <= n < self.num_channels
.Parameters: n – (int) index of power spectrogram channel to get 0-based Returns: ( np.array
) – the power spectrogram data in the n-th channel of the signal, 1D
-
get_magnitude_spectrogram_channel
(n)¶ Returns the n-th channel from
self.magnitude_spectrogram_data
.Raises: Exception
– If not0 <= n < self.num_channels
.Parameters: n – (int) index of magnitude spectrogram channel to get 0-based Returns: ( np.array
) – the magnitude spectrogram data in the n-th channel of the signal, 1D
-
to_mono
(overwrite=False, keep_dims=False)¶ Converts
audio_data
to mono by averaging every sample.Parameters: - overwrite (bool) – If True this function will overwrite
audio_data
. - keep_dims (bool) – If False this function will return a 1D array, else will return array with shape (1, n_samples).
Warning
If
overwrite=True
(default) this will overwrite any data inaudio_data
!Returns: ( np.array
) – Mono-ed version ofaudio_data
.- overwrite (bool) – If True this function will overwrite
-
stft_to_one_channel
(overwrite=False)¶ Converts
stft_data
to a single channel by averaging every sample. The shape ofstft_data
will be(num_freq, num_time, 1)
(where the last axis is the channel number).Parameters: overwrite (bool) – If True
this function will overwritestft_data
.Warning
If overwrite=True (default) this will overwrite any data in
stft_data
!Returns: ( np.array
) – Single channel version ofstft_data
.
- path_to_input_file (str) – Path to an input file to load upon initialization. Audio
gets loaded into