Multi-resolution Common Fate Transform
Fatemeh Pishdadian and Bryan Pardo
Demos and Source Code
This work was supported by United States National Science Foundation award number 1420971
The Multi-resolution Common Fate Transform (MCFT) is an audio signal representation developed in our lab in the context of audio source separation.
Audio source separation is the process of extracting a single sound (e.g. one violin) from a mixture of sounds (a string quartet). Many audio source separation algorithms are applied to audio representations that cannot resolve sounds that overlap in both time and frequency, such as the magnitude spectrogram. This puts a limit on their ability to separate sounds. Some researchers have developed biologically informed auditory models that are able to represent time-frequency overlapped sounds separately, but the process of creating the representation loses information and the separated sounds cannot be reconstructed from the representation. This limits the utility of biologically informed model output for source separation.
The MCFT:
-
Combines the invertibility of the Common Fate Transform (CFT), and the multi-resolution property of the cortical stage output of an auditory model.
-
Explicitly represents spectro-temporal modulation patterns of audio signals and thus facilitates the separation of signals that overlap in the time-frequency domain.
-
Has been shown to provide higher separability than the Short-time Fourier Transform (STFT), Constant-Q Transform (CQT), and CFT.
Related publications
[pdf] Fatemeh Pishdadian, Bryan Pardo, and Antoine Liutkus, “A multi-resolution approach to common fate-based audio separation,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
[pdf] Fatemeh Pishdadian and Bryan Pardo, “Multi-Resolution Common Fate Transform,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018.