# Spectrograms and STFTs¶

Most, if not all, source separation algorithms do not their operations in the time domain, but rather in the frequency
domain. For this, *nussl* provides an interface for working with Short-Time Fourier Transform (STFT)
data. Here, we describe how to do some simple STFT operations with the `nussl.AudioSignal`

object.

## STFT Basics¶

Let’s reinitialize `signal1`

from the previous page. We should be able to get frequency domain data by looking
at `signal1.stft_data`

. Let’s try that.

```
>>> signal1 = nussl.AudioSignal(input_file_path)
>>> signal1.stft_data
None
```

Whoops! Because this object was initialized from a .wav file (i.e., time-series data), this `AudioSignal`

: object has
no frequency domain data by default. To populate it with frequency data we do thusly:

```
>>> signal1.stft()
```

Aha! Now we can examine how STFT data is stored in the `AudioSignal`

: object. Similar to `signal1.audio_data`

,
STFT data is stored in a (complex-valued) numpy array called `signal1.stft_data`

[1].

```
>>> signal1.stft_data
array([[[ 5.65585184e+00 -0.00000000e+00j,
9.01010437e+01 -0.00000000e+00j],
[ -2.49999994e-03 -3.83490305e-06j,
-2.49999994e-03 -3.83490305e-06j],
[ -2.49999994e-03 -3.83490305e-06j,
-2.49999994e-03 -3.83490305e-06j],
[ -2.49999994e-03 -3.83490305e-06j,
-2.49999994e-03 -3.83490305e-06j],
[ -4.15182253e-03 -3.05598299e-03j,
-5.95030794e-03 -6.10865979e-03j],
[ 7.39212409e-02 +2.67153326e-02j,
1.48953497e-01 +5.34350201e-02j]],
...,
[[ -1.25701912e-03 -3.83491215e-06j,
6.89334124e-02 +2.11507810e-04j],
[ -2.49999994e-03 -7.66982430e-06j,
-2.49999994e-03 -7.66982430e-06j],
[ -2.49999994e-03 -7.66982430e-06j,
-2.49999994e-03 -7.66982430e-06j],
[ -2.49999994e-03 -7.66982430e-06j,
-2.49999994e-03 -7.66982430e-06j],
[ -2.51177116e-03 -5.69828972e-03j,
-2.67004268e-03 -1.13899261e-02j],
[ -4.96160686e-02 -4.97913919e-02j,
-1.36735573e-01 -0.00000000e+00j]]], dtype=complex64)
>>> signal1.stft_data.shape
(1025, 88, 2)
```

By inspecting the shape we see that the first dimension represents the number of FFT bins taken at each hop,
the second represents the length of our signal (in hops), and the third dimension is number of channels. There is
an easy way to get all of this data from our `AudioSignal`

: object without having to do array indexing:

```
>>> signal1.stft_length
88
>>> signal1.num_fft_bins
1025
```

We can get a single STFT channel like so:

```
>>> signal1.get_stft_channel(1) # see footnote 2 on AudioSignal Basics page
array([[ 1.23336256 -0.00000000e+00j, 0.03598116 -0.00000000e+00j,
0.10520950 -0.00000000e+00j, ..., 0.06182364 -0.00000000e+00j,
-1.39272857 -0.00000000e+00j, -0.27395117 -0.00000000e+00j],
[-1.45443594 +2.95939099e-04j, 0.03598259 +4.79334674e-04j,
0.10521182 +1.84503690e-04j, ..., 0.06182393 -5.87543298e-04j,
1.07824659 +5.23063958e-01j, -0.26018760 -6.48025423e-02j],
[ 1.23345983 +5.91909746e-04j, 0.03598688 +9.58720455e-04j,
0.10521877 +3.69027053e-04j, ..., 0.06182481 -1.17514888e-03j,
-0.96882135 -9.56665993e-01j, -0.22125557 -1.18454359e-01j],
...,
[-0.19744445 -8.19548659e-05j, -0.04824998 -1.32742891e-04j,
-0.03122028 -5.10948921e-05j, ..., 0.04447741 +1.62709228e-04j,
0.13960634 -1.32384062e-01j, 0.00856382 -1.65341552e-02j],
[ 0.17461725 -4.09772983e-05j, -0.04824989 -6.63712271e-05j,
-0.03122014 -2.55473606e-05j, ..., 0.04447743 +8.13543447e-05j,
-0.14375341 +7.24206716e-02j, 0.01395597 -9.03958548e-03j],
[-0.19744259 -0.00000000e+00j, -0.04824987 -0.00000000e+00j,
-0.03122010 -0.00000000e+00j, ..., 0.04447743 -0.00000000e+00j,
0.19829373 -0.00000000e+00j, 0.01586237 -0.00000000e+00j]], dtype=complex64)
```

We can also get power spectrogram data from our signal as well. As we would expect, this is the same
shape as `signal1.stft_data`

.

```
>>> signal1.power_spectrogram_data # np.abs(signal1.stft_data) ** 2
array([[[ 1.52118325e+00],
[ 1.29464362e-03],
[ 1.10690389e-02],
...,
[ 1.97824207e-03],
[ 3.93204018e-02],
[ 2.51614663e-04]]], dtype=float32)
>>> signal1.power_spectrogram_data.shape
(1025, 88, 2)
```

## Inverse STFTs¶

Let’s do something a little more interesting with our `AudioSignal`

: object. Since `signal1.stft_data`

is just
a regular numpy array, we can access and manipulate it as such. So let’s implement a low pass filter by creating a
new `AudioSignal`

: object and leaving `signal1`

unaltered.

Let’s eliminate all frequencies above about 400 Hz in our signal.

```
>>> lp_stft = signal1.stft_data.copy()
>>> lp_cutoff = 400 # Hz
>>> frequency_vector = signal1.freq_vector # a vector of frequency values for each FFT bin
>>> idx = (np.abs(frequency_vector - lp_cutoff)).argmin() # trick to find the index of the closest value to 400 Hz
>>> lp_stft[idx:, :, :] = 0.0j # every freq above ~400 Hz is 0 now
```

Okay, so now we have low passed STFT data in the numpy array `lp_stft`

. Now we are going to see how we can initialize
a new `AudioSignal`

: object using this data.

```
>>> signal1_lp = nussl.AudioSignal(stft=lp_stft)
```

Easy-peasy! Now `signal1_lp`

is a new `AudioSignal`

: object that has been initialized with STFT data instead of
time series data. Before we can write this to a file, we need to do an Inverse STFT to get back time-series data.

```
>>> signal1_lp.audio_data
None
>>> signal1_lp.istft()
>>> signal1_lp.write_audio_to_file('path/to/signal1_lowpass.wav')
```

Cool beans!

## STFT Parameters¶

I wanted to make a few quick notes about *nussl*’s parameter settings for STFTs and iSTFTs. Let’s have a quick look
at the function signature for `AudioSignal.stft()`

and `AudioSignal.istft()`

:

```
def stft(self, window_length=None, hop_length=None, window_type=None, n_fft_bins=None,
remove_reflection=True, overwrite=True, use_librosa=True):
...
def istft(self, window_length=None, hop_length=None, window_type=None, n_fft_bins=None, overwrite=True,
reconstruct_reflection=True, use_librosa=True):
...
```

Almost all low level parameters are accessible through this interface and can be adjusted accordingly.

As with `to_mono()`

on the previous page, `stft()`

and `istft()`

have parameters to overwrite the internal
data representations. By default they are true, be sure to set them to false when needed.

While *nussl* does have its own STFT and iSTFT implementations, it also contains wrappers for
librosa’s STFT and iSTFT
functions. There is a trade off to both: based on our tests *librosa’s* is faster, but *nussl’s* produces more accurate signal
reconstruction. Some algorithms produce artifacts with *nussl’s* STFTs, so *nussl* defaults to using librosa STFT functions [2].

The default
settings for forward and inverse STFTs are guaranteed to produce invertible results without crashing. But because
there are so many possibilities, *nussl* assumes the user will know what the correct way to compute both STFT and
iSTFT correctly. E.g., if you do not remove the FFT reflection when doing an STFT, *nussl* will not automatically
know not to reconstruct the reflection when doing an inverse STFT. It is the user’s responsibility to do this
kind of bookkeeping.

### StftParams Object¶

The `nussl.StftParams`

is an object that stores parameters for doing STFTs and iSTFTs. Its main function is to
keep track of user-set parameters for the duration of the `AudioSignal`

: object’s life. The separation
objects (`nussl.Repet`

, for instance) also have a `StftParams`

object, which is copied from the input
`AudioSignal`

: object.

Let’s get to some examples to demonstrate.

We can calculate an STFT with special, non-default parameters:

```
>>> signal1 = nussl.AudioSignal(input_file_path)
>>> signal1.stft(window_length=4096, hop_length=1024, window_type=constants.WINDOW_HANN)
```

But the next time we do an STFT, we’ll have the default parameters again. And *furthermore*, any other code
that calls `signal1.sift()`

(like, `nussl.Repet`

) will use the default parameters.

If we want to have these settings saved for the life of this `AudioSignal`

: object, we have to set them
in `signal1`

’s `StftParams`

object, at `signal1.stft_params`

.

```
>>> signal1 = nussl.AudioSignal(input_file_path)
>>> signal1.stft_params.window_length = 4096
>>> signal1.stft_params.hop_length = 1024
>>> signal1.stft_params.window_type = constants.WINDOW_HANN
>>> signal1.stft()
```

This block of code is equivalent to the first, but now when we call `signal1.stft()`

again it will use the same
settings again.

So now when `Repet`

does an STFT (or any other separation method in *nussl*), it will use our settings again.

```
>>> my_repet = nussl.Repet(signal1)
>>> my_repet.run()
```

The STFT inside `Repet`

used the parameters we set for `signal1`

. Automatically, `my_repet`

has a copy of
`signal1.stft_params`

:

```
>>> my_repet.stft_params.window_length
4096
>>> my_repet.stft_params.hop_length
1024
>>> my_repet.stft_params.window_type
'hann'
```

What if I want to change just one of the STFT parameters for only this repet instance? We can change the STFT parameters
for `my_repet`

and they won’t affect `signal1.stft_params`

.

```
>>> my_repet.stft_params.hop_length == 2048
>>> my_repet.run()
```

Now when we run `my_repet`

, the hop length is 2048 instead of what was set in `signal1`

, 1024.

Footnotes

[1] | All of the python console output on this page has been truncated for brevity. |

[2] | This may change in a future release. |