course-deep-learning

DEEP LEARNING: Northwestern University CS 396/496 Winter 2025

Top

Calendar

Links

Readings

Class Day/Time

Tuesdays and Thursdays, 9:30am - 10:50am Central Time

Loctation

2122 Sheridan Rd Classroom 250

Instructors

Professor: Bryan Pardo

TAs: Hugo Flores Garcia, Patrick O’Reilly

Peer Mentors: Jerry Cao, Saumya Pailwan, Anant Poddar, Nathan Pruyne

Office hours

Anant Poddar M 3pm - 5pm Mudd 3 floor front counter

Nathan Pruyne M 5pm - 7pm Mudd 3207

Bryan Pardo TU 11am - noon Mudd 3115

Saumya Pailwan W 2pm - 4pm Mudd 3 floor front counter

Jerry Cao W 5pm - 7pm Mudd 3 floor front counter

Patrick O’Reilly TH 12pm - 1pm, 2pm-3pm Mudd 3207

Hugo Flores Garcia TH 2pm - 4pm Mudd 3207

Course Description

This is a first course in Deep Learning. We will study deep learning architectures: perceptrons, multi-layer perceptrons, convolutional networks, recurrent neural networks (LSTMs, GRUs), attention networks, transformers, autoencoders, and the combination of reinforcement learning with deep learning. Other covered topics include regularization, loss functions and gradient descent.

Learning will be in the practical context of implementing networks using these architectures in a modern programming environment: Pytorch. Homework consists of a mixture of programming assignments, review of research papers, running experiments with deep learning architectures, and theoretical questions about deep learning.

Students completing this course should be able to reason about deep network architectures, build a deep network from scratch in Python, modify existing deep networks, train networks, and evaluate their performance. Students completing the course should also be able to understand current research in deep networks.

Course Prerequisites

This course presumes prior knowledge of machine learning equivalent to having taken CS 349 Machine Learning.

Course textbook

The primary text is the Deep Learning book. This reading will be supplemented by reading key papers in the field.

Course Policies

Questions outside of class

Please use CampusWire for class-related questions.

Submitting assignments

Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can’t finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.

Grading Policy

You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-…and so on).

Homework and reading assignments are solo assignments and must be your own original work. Use of large language models for answer generation is not allowed.

Extra Credit

There is an extra credit assignment worth 10 points. More details soon.

Course Calendar

Back to top

Week	Day and Date	Topic (tentative)	Due today	Points
1	Tue Jan 7	Perceptrons
1	-	Notebook 1: perceptrons
1	Thu Jan 9	Gradient descent
2	Tue Jan 14	Backpropagation of error
2	-	Notebook 2: MLP in Pytorch
2	Thu Jan 16	Multi-layer perceptrons
3	Tue Jan 21	Convolutional nets	Homework 1	15
3	-	Notebook 3: Image Classification
3	Thu Jan 23	regularization
4	Tue Jan 28	Data augmentation & generalization
4	-	Notebook 4: CNNs & Logging
4	Thu Jan 30	Adversarial examples
4	-	Notebook 5: adversarial examples
5	Tue Feb 4	Generative adversarial networks (GANS)	Homework 2	15
5	-	Notebook 6: GANs
5	Thu Feb 6	Catch up day
6	Tue Feb 11	MIDTERM	Midterm	20
6	Thu Feb 13	Unsupervised methods
6	-	Notebook 7: autoencoders
7	Tue Feb 18	recurrent nets
7	Thu Feb 20	LSTMs	Homework 3	15
7	-	Notebook 8: RNNs
8	Tue Feb 25	Deep RL
8	Thu Feb 27	Reinforcement learning (RL)
9	Tue Mar 4	Pong with Reinforcement learning (RL)
9	Thu Mar 6	Attention networks	Homework 4	15
10	Tue Mar 11	Transformers
10	Thu Mar 13	FINAL EXAM	Final Exam	20
11	Thu Mar 20	Extra Credit Due	Extra Credit	10

Links

Back to top

Helpful Programming Packages

Anaconda is the most popular python distro for machine learning.

Pytorch Facebook’s popular deep learning package. My lab uses this.

Tensorboard is what my lab uses to visualize how experiments are going.

Tensorflow is Google’s most popular python DNN package

Keras A nice programming API that works with Tensorflow

JAX Is an alpha package from Gogle that allows differentiation of numpy and also an optimizing compiler for working on tensor processing units

Trax Is Google Brain’s DNN package. It focuses on transformers and is implemented on top of JAX

MXNET is Apache’s open source DL package.

Helpful Books on Deep Learning

Deep Learning is THE book on Deep Learning. One of the authors won the Turing prize due to his work on deep learning.

Dive Into Deep Learning provides example code and instruction for how to write DL models in Pytorch, Tensorflow and MXNet.

Computing Resources

Google’s Colab offers free GPU time and a nice environment for running Jupyter notebook-style projects. For $10 per month, you also get priority access to GPUs and TPUs.

Amazon’s SageMaker offers hundres of free hours for newbies.

The CS Department Wilkinson Lab just got 22 new machines that each have a graphics card suitable for deep learning, and should be remote-accessable and running Linux with all the python packages needed for deep learning.

Course Reading

Back to top

Book Chapter Readings

Chapter 4 of Machine Learning : READ THIS FIRST This is Tom Mitchell’s book. Historical overview + explanation of backprop of error. It’s a good starting point for actually understanding deep nets. Read the whole chapter.
What are Gradients, Jacobians, and Hessians?: This isn’t a book chapter, but if you don’t know what a gradient, Jacobian or Hessian is, you should read this before you read Chapter 4 of the Deep Learning book.
Chapter 4 of the Deep Learning Book: This covers basics of gradient-based optimization. Read through Section 4.3.
Chapter 6 of Deep Learning: This covers the basics from a more modern perspective. To my mind, if you’ve read Tom Mitchell, it is mostly useful for covering different kinds of activation functions. Read through Section 6.4
Chapter 7 of the Deep Learning Book: Covers regularization. The minimal useful read is sections 7.1 and 7.4…but this assumes you’ll read the papers some of the other sections are based on. Those papers are in the additional readings. If you don’t read those, then I’d add 7.9, 7.12, 7.13.
Chapter 8 of the Deep Learning Book: This covers optimization. Read through Section 8.5. Beyond that, it is stuff outside the scope of the class.
Chapter 9 of Deep Learning: Convolutional networks. Read 9.1 through 9.4 and 9.10

—— no book chapter below this line will be expected for the midterm ——

Understanding LSTMs: A simple (maybe too simple?) walk-through of LSTMs. Good to read before trying the book chapter on this topic.
Chapter 10 of Deep Learning: RNNs and LSTMS
Reinforcement Learning: An Introduction, Chapters 3 and 6: This gives you the basics of what reinforcement learning (RL) is about.

Additional Readings

Generalization and Network Design Strategies: The original 1989 paper where LeCun describes Convolutional networks.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Explaining and Harnessing Adversarial Examples : This paper got the ball rolling by pointing out how to make images that look good but are consistently misclassified by trained deepnets.
Generative Adversarial Nets: The paper that introduced GANs.
Dropout: A Simple Way to Prevent Neural Networks from Overfitting: Explains a widely-used regularizer

—— no additional reading below this line will be expected for the midterm ——

DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks: This is an end-to-end model. Many papers build on this. The homework uses the discriminator approach from this paper
Long Term Short Term Memory: The original 1997 paper introducing the LSTM
Playing Atari with Deep Reinforcement Learning: A key paper that showed how reinforcement learning can be used with deep nets. This is discussed in class.
Deep Reinforcement Learning: Pong from Pixels: This is the blog we base part of Homework 4 on.
The Illustrated Transformer: A good walkthrough that helps a lot with understanding transformers
Attention is All You Need: The paper that introduced transformers, which are a popular and more complicated kind of attention network.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: A widely-used language model based on Transformer encoder blocks.
The Illustrated GPT-2: Not a paper, but a good overview of GPT-2 and its relation to Transformer decoder blocks.

Top

Calendar

Links

Readings