teaching

CS 352 MACHINE PERCEPTION OF MUSIC AND AUDIO

Northwestern University Winter 2026

Top Calendar Links Readings

Course Description

This course covers machine extraction of structure in audio files covering areas such as source separation (unmixing audio recordings into individual component sounds), sound object recognition (labeling sounds), melody tracking, beat tracking, and perceptual mapping of audio to machine-quantifiable measures.

This course is approved for the Breadth Interfaces & project requirement in the CS curriculum.

Prior programming experience sufficient to be able to do laboratory assignments in PYTHON, implementing algorithms and using libraries without being taught to do so (there is no language instruction on Python). Having taken EECS 211 and 214 would demonstrate this experience.

Course Textbook

Fundamentals of Music Processing

Time & Place

Lecture: Tue, Thu, 3:30 - 4:50pm CST in 2122 Sheridan Rd Classroom 250

Instructors & Office Hours

Prof. Bryan Pardo 10am - 11am Thursdays in Mudd 3115

TA Annie Chu 11am - 1pm Tuesdays in Mudd 3202

[Peer Mentor] EJ Van De Grift 2pm - 3pm Wednesdays in Mudd 3108

Postdoc Jason Smith Jason Smith will help guide final projects.

Course Policies

Questions outside of class

Please use CampusWire for class-related questions.

Grading Policy

You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-…and so on).

Every assignment is worth 20 points. There are 5 assignments (including the final project). Your final grade will be the sum of midterm grade + your 4 highest assignment grades. This means you can skip any one assignment.

Homework and reading assignments are solo assignments and must be your original work.

AI policy

You are expected to write your own code and write up your own answers to question. This means you. Not ChatGPT or Gemini or Copilot. This is an optional class you are (presumably) taking because you’re interested. So put in the time to learn this stuff, yourself.

Submitting assignments

Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can’t finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.

Extra credit.

Course Calendar

Week Date Topic ASSIGNMENT Points
1 Tue Jan 6 Course intro, Recording basics    
1 Thu Jan 8 Frequency & Pitch, Tuning Systems    
2 Tue Jan 13 Loudness & Amplitude    
2 Thu Jan 15 Fourier Transforms & Spectrograms    
3 Tue Jan 20 Convolution & Filtering HW 1 Audio Basics 20
3 Thu Jan 22 Convolution & FFT notebooks    
4 Tue Jan 27 MFCCs and Chromagrams    
4 Thu Jan 29 Sound Object Labeling HW 2 Spectrograms, Masking 20
5 Tue Feb 3 Self Similarity & MFCC & Chroma notebooks    
5 Thu Feb 5 MIDTERM REVIEW    
6 Tue Feb 10 Pitch Tracking MIDTERM 20
6 Thu Feb 12 Deep Learning    
7 Tue Feb 17 Deep Learning HW 3 Infinite Jukebox 20
7 Thu Feb 19 Embeddings, VoiceID, Source Separation    
8 Tue Feb 24 Cross Modal Embeddings & Embeddings Notebook    
8 Thu Feb 26 Final project group formation & proposals HW 4 Using Embeddings 20
9 Tue Mar 3 Zoom meetings with project groups (no class: meetings by appointment) Project proposal due  
9 Thu Mar 5 Current research in music & audio processing (Annie)    
10 Tue Mar 10 Current research in music & audio processing    
10 Thu Mar 12 Zoom meetings with project groups (no class: meetings by appointment) Project meeting  
11 Wed Mar 18 Final project presentations 7-9pm Final project 20

Course Reading

Fundamentals of Music Processing, Chapter 1

Fundamentals of Music Processing, Chapter 2 & Section 3.1

Fundamentals of Music Processing, Chapter 4

Fundamentals of Music Processing, Chapter 6

Fundamentals of Music Processing, Chapter 7

* REPET for Background/Foreground Separation in Audio

Chapter 4 of Machine Learning : This is Tom Mitchell’s book. Historical overview + explanation of backprop of error. It’s a good starting point for actually understanding deep nets.

Yin: a fundamental frequency estimator for speech and music - This is, perhaps, the most popular pitch tracker.

Crepe: A Convolutional Representation for Pitch Estimation - A deep learning pitch tracker that improves on Yin.

The dummy’s guide to MFCC - an easy, high-level read. Start with this.

From Frequency to Quefrency: A History of the Cepstrum - a historical analysis of the uses of cepstrums

Recovering sound sources from embedded repetition - This is a paper on how humans actually listen to and parse audio based on repetition. Read any time.

Places to get ideas

EECS 352 Final projects from 2017 and 2015

The infinite jukebox

Google’s Project Magenta

Facebook’s Universal Music Translation

A coursera corse on pitch tracking

Datasets

U of Iowa’s Music Instrument Samples Dataset

The SocialFX data set of word descriptors for audio

VocalSet: a singing voice dataset consisting of 10.1 hours of monophonic recorded audio of professional singers

VocalSketch: thousands of vocal imitations of a large set of diverse sounds

Bach10: audio recordings of each part and the ensemble of ten pieces of four-part J.S. Bach chorales

The Million Song Dataset

Software

Python Utilities for Detection and Classification of Acoustic Scenes

Librosa audio and music processing in Python

Essentia: an open source music analysis toolkit includes a bunch of feature extractors and pre-trained models for extracting e.g. beats per minute, mood, genre, etc.

Yaafe - audio features extraction toolbox

Sonic Visualizer music viz software

Lily Pond, open source music notation software

SoundSlice guitar tab and notation website

Top Calendar Links Readings