course-deep-learning

DEEP LEARNING: Northwestern University CS 396/496 Winter 2025

Top Calendar Links Readings

Class Day/Time

Tuesdays and Thursdays, 9:30am - 10:50am Central Time

Loctation

2122 Sheridan Rd Classroom 250

Instructors

Professor: Bryan Pardo

TAs: Hugo Flores Garcia, Patrick O’Reilly

Peer Mentors: Jerry Cao, Saumya Pailwan, Anant Poddar, Nathan Pruyne

Office hours

Anant Poddar M 3pm - 5pm Mudd 3 floor front counter

Nathan Pruyne M 5pm - 7pm Mudd 3207

Bryan Pardo TU 11am - noon Mudd 3115

Saumya Pailwan W 2pm - 4pm Mudd 3 floor front counter

Jerry Cao W 5pm - 7pm Mudd 3 floor front counter

Patrick O’Reilly TH 12pm - 1pm, 2pm-3pm Mudd 3207

Hugo Flores Garcia TH 2pm - 4pm Mudd 3207

Course Description

This is a first course in Deep Learning. We will study deep learning architectures: perceptrons, multi-layer perceptrons, convolutional networks, recurrent neural networks (LSTMs, GRUs), attention networks, transformers, autoencoders, and the combination of reinforcement learning with deep learning. Other covered topics include regularization, loss functions and gradient descent.

Learning will be in the practical context of implementing networks using these architectures in a modern programming environment: Pytorch. Homework consists of a mixture of programming assignments, review of research papers, running experiments with deep learning architectures, and theoretical questions about deep learning.

Students completing this course should be able to reason about deep network architectures, build a deep network from scratch in Python, modify existing deep networks, train networks, and evaluate their performance. Students completing the course should also be able to understand current research in deep networks.

Course Prerequisites

This course presumes prior knowledge of machine learning equivalent to having taken CS 349 Machine Learning.

Course textbook

The primary text is the Deep Learning book. This reading will be supplemented by reading key papers in the field.

Course Policies

Questions outside of class

Please use CampusWire for class-related questions.

Submitting assignments

Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can’t finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.

Grading Policy

You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-…and so on).

Homework and reading assignments are solo assignments and must be your own original work. Use of large language models for answer generation is not allowed.

Extra Credit

There is an extra credit assignment worth 10 points. More details soon.

Course Calendar

Back to top

Week Day and Date Topic (tentative) Due today Points
1 Tue Jan 7 Perceptrons    
1 - Notebook 1: perceptrons    
1 Thu Jan 9 Gradient descent    
2 Tue Jan 14 Backpropagation of error    
2 - Notebook 2: MLP in Pytorch    
2 Thu Jan 16 Multi-layer perceptrons    
3 Tue Jan 21 Convolutional nets Homework 1 15
3 - Notebook 3: Image Classification    
3 Thu Jan 23 regularization    
4 Tue Jan 28 Data augmentation & generalization    
4 - Notebook 4: CNNs & Logging    
4 Thu Jan 30 Adversarial examples    
4 - Notebook 5: adversarial examples    
5 Tue Feb 4 Generative adversarial networks (GANS) Homework 2 15
5 - Notebook 6: GANs    
5 Thu Feb 6 Catch up day    
6 Tue Feb 11 MIDTERM Midterm 20
6 Thu Feb 13 Unsupervised methods    
6 - Notebook 7: autoencoders    
7 Tue Feb 18 recurrent nets    
7 Thu Feb 20 LSTMs Homework 3 15
7 - Notebook 8: RNNs    
8 Tue Feb 25 Deep RL    
8 Thu Feb 27 Reinforcement learning (RL)    
9 Tue Mar 4 Pong with Reinforcement learning (RL)    
9 Thu Mar 6 Attention networks Homework 4 15
10 Tue Mar 11 Transformers    
10 Thu Mar 13 FINAL EXAM Final Exam 20
11 Thu Mar 20 Extra Credit Due Extra Credit 10

Back to top

Helpful Programming Packages

Anaconda is the most popular python distro for machine learning.

Pytorch Facebook’s popular deep learning package. My lab uses this.

Tensorboard is what my lab uses to visualize how experiments are going.

Tensorflow is Google’s most popular python DNN package

Keras A nice programming API that works with Tensorflow

JAX Is an alpha package from Gogle that allows differentiation of numpy and also an optimizing compiler for working on tensor processing units

Trax Is Google Brain’s DNN package. It focuses on transformers and is implemented on top of JAX

MXNET is Apache’s open source DL package.

Helpful Books on Deep Learning

Deep Learning is THE book on Deep Learning. One of the authors won the Turing prize due to his work on deep learning.

Dive Into Deep Learning provides example code and instruction for how to write DL models in Pytorch, Tensorflow and MXNet.

Computing Resources

Google’s Colab offers free GPU time and a nice environment for running Jupyter notebook-style projects. For $10 per month, you also get priority access to GPUs and TPUs.

Amazon’s SageMaker offers hundres of free hours for newbies.

The CS Department Wilkinson Lab just got 22 new machines that each have a graphics card suitable for deep learning, and should be remote-accessable and running Linux with all the python packages needed for deep learning.

Course Reading

Back to top

Book Chapter Readings

  1. Chapter 4 of Machine Learning : READ THIS FIRST This is Tom Mitchell’s book. Historical overview + explanation of backprop of error. It’s a good starting point for actually understanding deep nets.

  2. What are Gradients Jacobians and Hessians?. This isn’t a book chapter, but if you don’t know what a gradient, Jacobian or Hessian is, you should read this before you read Chapter 4 of the Deep Learning book.

  3. Chapter 4 of the Deep Learning Book: This covers basics of gradient-based optimization.

  4. Chapter 6 of Deep Learning: This covers the basics from a more modern perspective. To my mind, if you’ve read Tom Mitchell, it is mostly useful for covering different kinds of activation functions.

  5. Chapter 7 of the Deep Learning Book: Covers regularization.

  6. Chapter 8 of the Deep Learning Book: This covers optimization.

  7. Chapter 9 of Deep Learning: Convolutional networks.

  8. Understanding LSTMs: A simple (maybe too simple?) walk-through of LSTMs. Good to read before trying the book chapter on this topic.

  9. Chapter 10 of Deep Learning: RNNs and LSTMS

  10. Reinforcement Learning: An Introduction, Chapters 3 and 6: This gives you the basics of what reinforcement learning (RL) is about.

Additional Readings

  1. Generalization and Network Design Strategies: The original 1989 paper where LeCun describes Convolutional networks.

  2. Explaining and Harnessing Adversarial Examples : This paper got the ball rolling by pointing out how to make images that look good but are consistently misclassified by trained deepnets.

  3. Generative Adversarial Nets: The paper that introduced GANs.

  4. Dropout: A Simple Way to Prevent Neural Networks from Overfitting: Explains a widely-used regularizer

  5. DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks: This is an end-to-end model. Many papers build on this. The homework uses the discriminator approach from this paper

  6. Long Term Short Term Memory: The original 1997 paper introducing the LSTM

  7. Playing Atari with Deep Reinforcement Learning: A key paper that showed how reinforcement learning can be used with deep nets. This is discussed in class.

  8. Deep Reinforcement Learning: Pong from Pixels: This is the blog we base part of Homework 4 on.

  9. The Illustrated Transformer: A good walkthrough that helps a lot with understanding transformers

  10. Attention is All You Need: The paper that introduced transformers, which are a popular and more complicated kind of attention network.

  11. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: A widely-used language model based on Transformer encoder blocks.

  12. The Illustrated GPT-2: Not a paper, but a good overview of GPT-2 and its relation to Transformer decoder blocks.

Top Calendar Links Readings