CS 352 MACHINE PERCEPTION OF MUSIC AND AUDIO
Northwestern University Winter 2021
This course covers machine extraction of structure in audio files covering areas such as source separation (unmixing audio recordings into individual component sounds), sound object recognition (labeling sounds), melody tracking, beat tracking, and perceptual mapping of audio to machine-quantifiable measures.
This course is approved for the Breadth Interfaces & project requirement in the CS curriculum.
Prior programming experience sufficient to be able to do laboratory assignments in PYTHON, implementing algorithms and using libraries without being taught to do so (there is no language instruction on Python). Having taken EECS 211 and 214 would demonstrate this experience.
Time & Place
Lecture: Tuesday, Thursday, 6:30 - 7:50pm CST on ZOOM
Prof. Bryan Pardo Office Hours & Location:
Mondays 5:00 - 6:30pm CST on ZOOM
Questions outside of class
Please use CampusWire for class-related questions.
You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-…and so on).
Homework and reading assignments are solo assignments and must be original work.
Final projects are group assignments and all members of a group will share a grade for all parts of the assignment.
Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can’t finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.
Students can earn a MAXIMUM TOTAL of 10 extra-credit points (A full letter grade):
Participation during lecture You will be asked to select 2 lectures for which you will be on-call. In your on-call lectures, I will feel free to call on you and will expect that you’ve done the relevant reading prior to lecture and will be able to engage in meaningful interaction on the lecture topic. Each on-call day will be worth 3 points, for a total of 6 class participation points.
Paper reviews You will be able to earn extra credit by submitting reviews of up to 4 extra-credit papers in the field. Each paper review will be worth 1 point, for a total of 4 paper review points.
|1||Tue Jan 12||Course intro, Recording basics|
|1||Thu Jan 14||How we hear, Frequency & Pitch|
|2||Tue Jan 19||Loudness & Amplitude|
|2||Thu Jan 21||The Fourier Series & Spectrogram|
|3||Tue Jan 26||The Fourier Series & Spectrogram|
|3||Thu Jan 28||Convolution, Reverb||HW 1||20|
|4||Tue Feb 2||Correlation and Reverb|
|4||Thu Feb 4||Convolution and Filtering|
|5||Tue Feb 9||Time-frequency masking|
|5||Thu Feb 11||Audio Similarity & KNN||HW 2||20|
|6||Tue Feb 16||Labeling Sound Events|
|6||Thu Feb 18||Audio Fingerprinting (Shazam)|
|7||Tue Feb 23||Deep Learning (briefly)||HW 3||20|
|7||Thu Feb 25||Deep Source Separation|
|8||Tue Mar 2||Deep Embeddings|
|8||Thu Mar 4||Pitch tracking|
|9||Tue Mar 9||Deep Models for Audio||HW 4||20|
|9||Thu Mar 11||Current work in the lab|
|10||Thu Mar 18||Final assignment due||HW 5||20|
EXTRA CREDIT READING (This list will be added to)
Chapter 4 of Machine Learning : This is Tom Mitchell’s book. Historical overview + explanation of backprop of error. It’s a good starting point for actually understanding deep nets. IT’S WORTH 2 XTRA CREDIT READINGS. THE CATCH IS THAT YOU HAVE TO WRITE 2 PAGES TO GET THE 2 POINTS
Mel-frequency Cepstral Coefficients
The dummy’s guide to MFCC - an easy, high-level read. Start with this.
From Frequency to Quefrency: A History of the Cepstrum - a historical analysis of the uses of cepstrums
Recovering sound sources from embedded repetition - This is a paper on how humans actually listen to and parse audio based on repetition. Read any time.
Paper coming…as soon as I find one I like.
Audio source separation (demixing)
Deep clustering: Discriminative embeddings for segmentation and separation - Don’t try to read this till you know something about deep learning.
Audio search and recommendation
Lessons learned building a large music recommender system (This one is a video) - This is a talk by the chief researcher on music recommendation on Pandora. Watch any time.
An Industrial-Strength Audio Search Algorithm (Shazam) - Describes how the popular Shazam app for music audio ID works.
OtoMechanic: Auditory Automobile Diagnostics via Query-by-Example - A deep-learning based sound ID system. Also, watch the related video.
A Human-in-the-Loop System for Sound Event Detection and Annotation - An interactive sound-event labeler. Also, watch the related video.
Yin: a fundamental frequency estimator for speech and music - This is, perhaps, the most popular pitch tracker.
Crepe: A Convolutional Representation for Pitch Estimation - A deep learning pitch tracker that improves on Yin.
Places to get ideas
Essentia: an open source music analysis toolkit includes a bunch of feature extractors and pre-trained models for extracting e.g. beats per minute, mood, genre, etc.