teaching

CS 352 MACHINE PERCEPTION OF MUSIC AND AUDIO

Northwestern University Spring 2025

Top Calendar Links Readings

Course Description

This course covers machine extraction of structure in audio files covering areas such as source separation (unmixing audio recordings into individual component sounds), sound object recognition (labeling sounds), melody tracking, beat tracking, and perceptual mapping of audio to machine-quantifiable measures.

This course is approved for the Breadth Interfaces & project requirement in the CS curriculum.

Prior programming experience sufficient to be able to do laboratory assignments in PYTHON, implementing algorithms and using libraries without being taught to do so (there is no language instruction on Python). Having taken EECS 211 and 214 would demonstrate this experience.

Course Textbook

Fundamentals of Music Processing

Time & Place

Lecture: Monday, Wednesday, 3:30 - 4:50pm CST in Tech M128

Instructors & Office Hours

Prof. Bryan Pardo Tuesdays 10am - 11am in Mudd 3115

TA Annie Chu Mondays, 11am - 1pm in Mudd 3207

PM Nathan Pruyne Mondays, 5-7pm in Mudd 3207

Postdoc Jason Smith No office hours till the last week of class. Will update then.

Course Policies

Questions outside of class

Please use CampusWire for class-related questions.

Grading Policy

You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-…and so on).

Homework and reading assignments are solo assignments and must be original work.

Submitting assignments

Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can’t finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.

Extra credit.

Students can earn a MAXIMUM TOTAL of 10 extra-credit points (A full letter grade) by doing an extra credit assignment. For the extra credit assignment, you will be pairing up into groups of 2 and both people in the group will get the same grade.

Course Calendar

Week Date Topic ASSIGNMENT Points
1 Tue Apr 1 Course intro, Recording basics    
1 Wed Apr 2 Frequency & Pitch, Tuning Systems    
2 Mon Apr 7 Loudness & Amplitude    
2 Wed Apr 9 Fourier Transforms & Spectrograms    
3 Mon Apr 14 Convolution & Filtering    
3 Wed Apr 16 Convolution & FFT notebooks HW 1 Audio basics 20
4 Mon Apr 21 Time-frequency Masking    
4 Wed Apr 23 The REPET-SIM algorithm    
5 Mon Apr 28 Research in the interactive audio lab    
5 Wed Apr 30 MFCCs and Chromagrams (Jason) HW 2 Spectrograms, Masking 20
6 Mon May 5 The Infinite Jukebox    
6 Wed May 7 Pitch Tracking    
7 Mon May 12 Deep Learning HW 3 Infinite Jukebox 20
7 Wed May 14 Deep Learning    
8 Mon May 19 Embeddings & Sound Object Labeling    
8 Wed May 21 VoiceID    
9 Mon May 26 Cross Modal Embeddings HW 4 Pitch Tracking 20
9 Wed May 28 Text2FX    
10 Mon Jun 2 Gesture Tracking for music control    
10 Wed Jun 4 Gesture Tracking for music control HW 5 Using Embeddings 20
11 Wed Jun 11 Extra credit assignment due XC Gesture Control 10

Demo Code

Convolution demo notebook

Textbook Reading

Fundamentals of Music Processing, Chapter 1

Fundamentals of Music Processing, Chapter 2 & Section 3.1

Fundamentals of Music Processing, Chapter 4

Fundamentals of Music Processing, Chapter 6

Fundamentals of Music Processing, Chapter 7

* REPET for Background/Foreground Separation in Audio

EXTRA READING (This list will be added to)

Deep Learning

Chapter 4 of Machine Learning : This is Tom Mitchell’s book. Historical overview + explanation of backprop of error. It’s a good starting point for actually understanding deep nets. IT’S WORTH 2 XTRA CREDIT READINGS. THE CATCH IS THAT YOU HAVE TO WRITE 2 PAGES TO GET THE 2 POINTS

Mel-frequency Cepstral Coefficients

The dummy’s guide to MFCC - an easy, high-level read. Start with this.

From Frequency to Quefrency: A History of the Cepstrum - a historical analysis of the uses of cepstrums

Human Perception

Recovering sound sources from embedded repetition - This is a paper on how humans actually listen to and parse audio based on repetition. Read any time.

The MP3

Paper coming…as soon as I find one I like.

Audio production

Audealize: Crowdsourced Audio Production Tools - This describes a new way to make audio FX easy to use. Also, try the demo app.

Audio source separation (demixing)

Deep clustering: Discriminative embeddings for segmentation and separation - Don’t try to read this till you know something about deep learning.

Cerberus - A system that separates sounds in a musical mix and also transcribes them. Wait till you’ve gotten some deep learning education. Also, watch the related video

Audio search and recommendation

Lessons learned building a large music recommender system (This one is a video) - This is a talk by the chief researcher on music recommendation on Pandora. Watch any time.

An Industrial-Strength Audio Search Algorithm (Shazam) - Describes how the popular Shazam app for music audio ID works.

OtoMechanic: Auditory Automobile Diagnostics via Query-by-Example - A deep-learning based sound ID system. Also, watch the related video.

A Human-in-the-Loop System for Sound Event Detection and Annotation - An interactive sound-event labeler. Also, watch the related video.

Pitch tracking

Yin: a fundamental frequency estimator for speech and music - This is, perhaps, the most popular pitch tracker.

Crepe: A Convolutional Representation for Pitch Estimation - A deep learning pitch tracker that improves on Yin.

Places to get ideas

EECS 352 Final projects from 2017 and 2015

The infinite jukebox

Google’s Project Magenta

Facebook’s Universal Music Translation

A coursera corse on pitch tracking

Datasets

U of Iowa’s Music Instrument Samples Dataset

The SocialFX data set of word descriptors for audio

VocalSet: a singing voice dataset consisting of 10.1 hours of monophonic recorded audio of professional singers

VocalSketch: thousands of vocal imitations of a large set of diverse sounds

Bach10: audio recordings of each part and the ensemble of ten pieces of four-part J.S. Bach chorales

The Million Song Dataset

Software

Python Utilities for Detection and Classification of Acoustic Scenes

Librosa audio and music processing in Python

Essentia: an open source music analysis toolkit includes a bunch of feature extractors and pre-trained models for extracting e.g. beats per minute, mood, genre, etc.

Yaafe - audio features extraction toolbox

The Northwestern University Source Separation Library (nussl)

Sonic Visualizer music viz software

Lily Pond, open source music notation software

SoundSlice guitar tab and notation website

Top Calendar Links Readings