Course materials for the Interactive Audio Lab

View the Project on GitHub interactiveaudiolab/teaching

DEEP LEARNING, Northwestern University EECS 495-050 spring 2017

Loctation: Technological Institute LG52

Day/Time: Wednesdays, 3:00pm - 5:50pm

Instructor: Bryan Pardo

Course Description

In this course students will study deep learning architectures such as restricted Boltzmann machines, deep neural networks, convolutional deep neural networks, deep belief networks and recurrent neural networks (LSTMs, GRUs). They will read original research papers that describe the algorithms and how they have been applied to fields like computer vision, machine translation, automatic speech recognition, and audio event recognition.

Course Calendar

Week Date Topic Deliverable Points
  Every week   Class Participation 2 per week * 10 weeks = 20
  Once   Present a topic area 20
1 March 29 Perceptrons, MLPs    
2 April 5 RBMs, Deep Belief Networks    
3 April 12 Regularization & Optimization    
4 April 19 Convolutional Networks    
5 April 26 Image Net and Descendents 10 paper reviews 20
6 May 3 Deep and Adversarial Networks    
7 May 10 Recurrent Networks    
8 May 17 Long Short Term Memory    
9 May 24 Reinforcement Learning 10 paper reviews 20
10 May 31 CTC and Highway LSTM    
11 June 7   Final Project 20

Anaconda: The most popular python distro for machine learning

Scikit Learn: the most popular machine learning python package

Tensorflow: the most popular python DNN package

Keras: A nice python API for Tensorflow

My guide to installing Keras and Tensorflow on MacOS

Susho says Pytorch is the way to go for RNNs

Course Reading

Week 1: March 29 Perceptrons and Multilayer Perceptrons

Chapter 1 of Parallel Distributed Processing

Chapter 4 of Machine Learning

Chapter 6 of Deep Learning

Blog: A hacker’s guide to neural nets

Week 2: April 5 Restricted Boltzmann Machines and Deep Belief Networks

Chapter 16 and 20 of Deep Learning Book (just the bits on RBMs)

Scaling Learning Algorithms towards AI

Week 3: April 12 Regularization and Optimization

Chapters 7 and 8 of the Deep Learning Book

Why Momentum Really Works

Kezhen:Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

Batch Normalization

Self-normalizing networks

A blog describing how to easily implement Self-normalizing networks

Week 4: April 19 Convolutional Networks

Chapter 9 of Deep Learning: Convolutional Networks

Joe: Generalization and Network Design Strategies

Kezhen: Convolutional Networks for Images, Speech, and Time-Series

Bryan: Network In Network

Going Deeper with Convolutions

Week 5: April 26 Image Net and Descendents

The Imagenet Data

Wei: ImageNet Classification with Deep Convolutional Neural Networks

Yiming: What I learned from Competing against Convnet on Imagenet

Blair: Visualizing and Understanding Convolutional Networks

A guide to convolution arithmetic for deep learning

Deconvolution and Checkerboard Artifacts

Bryan: Rich feature hierarchies for accurate object detection and semantic segmentation

Selective Search for Object Recognition

Week 6: May 3 Going Deep and Getting Adversarial

Wei: Very Deep Convolutional Networks for Large-Scale Image Recognition

Huayi: Deep Residual Learning for Image Recognition

Fast R-CNN (Deep ResNets uses this)

Joe: Explaining and Harnessing Adversarial Examples

Sushobhan: Generative Adversarial Nets

Julia Evans’ walkthrough of generating adversarial examples

Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Week 7: May 10 Recurrent Networks

Chapter 10 of Deep Learning

A tutorial on Back Propagation Through Time

On the Difficulties of Training Recurrent Networks

Understanding LSTMs

Ben: Long Term Short Term Memory

Deep Visual-Semantic Alignments for Generating Image Descriptions


Experiments in Handwriting with a LSTM Neural Network

Week 8: May 17 Long Short Term Memory Networks

The paper that introduced GRUs: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

CJ: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Yiming: Learning to Forget: Continual Prediction with LSTM

CJ: Sequence to Sequence Learning with Neural Networks

The NIPS talk on Sequence to Sequence Learning

Week 9: May 24 Reinforcement Learning & Attention

Reinforcement Learning: An Introduction, Chapters 3 and 6

My lecture notes on reinforcement learning (easier intro)

Sushobhan: Attention and Augmented Recurrent Neural Networks

Max: Playing Atari with Deep Reinforcement Learning

Huayi: Mastering the game of Go with deep neural networks and tree search

Week 10: May 31 Highway Networks, Speech Recognition

Highway Networks (useful for understanding Highway LSTMs)

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

The Rabiner HMM tutorial

A 2nd quarter of deep learning would focus on

Using Deepnets as intuition for search directions

DeepMath - Deep Sequence Models for Premise Selection

Recursive Cortical Networks (better than Deep Nets??)

A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs

Shrinking Networks: Teacher-student networks

Do Deep Nets Really Need to be Deep?

Removing Recursion

Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency

Quasi Recurrent Neural Networks

Cross-modal learning

Cross Modal Distillation for Supervision Transfer

Unsupervised learning

Tutorial on Variational Autoencoders

External memory

Neural Turing Machines

Attention-based models

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Listen, Attend and Spell

Describing Videos by Exploiting Temporal Structure

Modelling Auditory Attention

Attention is All You Need

The Illustrated Transformer (This is a tutorial that explains the concepts in “Attention is All You Need”)

Audio Source Separation

Deep clustering: Discriminative embeddings for segmentation and separation


Audio scene labeling and source separation

SoundNet: Learning Sound Representations from Unlabeled Video



Unsupervised Cross-Modal Deep-Model Adaptation for Audio-Visual Re-Identification With Wearable Cameras

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Other stuff with sound

Deep Cross-Modal Audio-Visual Generation

Dialog Systems

Building End-to-End Dialogue Systems Using Generative Hierarchical Neural Network Models

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition

Slides explaining what an iVector is

Analysis of I-vector Length Normalization in Speaker Recognition Systems

Highway Networks (useful for understanding Highway LSTMs)

Highway Long Short-Term Memory RNNs for Distant Recognition

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

The Rabiner HMM tutorial

Speech and Generation Systems

WaveNet: A Generative Model for Raw Audio

Deep Voice: Real-time Neural Text-to-Speech

The Deep Voice Talk at ICML

SampleRNN: Bengio’s take on generating speech

Tacotron is the speech generation system used at Google

Tacotron 2 is the 2018 speech generation system from Google

Check out Lyrebird

Image generation

Pixel Recurrent Neural Networks (WaveNet is based on this)

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Conditional Image Generation with PixelCNN Decoders

Estimating Confidence

Learning Confidence for Out-of-Distribution Detection in Neural Networks


The Reddit discussion on the current state of the art in estimating confidence

Why aren’t deep nets overfitting?

Understanding deep learning requires rethinking generalization

The Implicit Bias of Gradient Descent on Separable Data