| Top | Calendar | Links | Readings |
Tuesdays and Thursdays, 3:30pm - 4:50pm Central Time
Technological Institute M345
Professor: Bryan Pardo Monday 10-11am on Class zoom link (https://northwestern.zoom.us/j/92331213369 )
Teaching Assistant: Chongyang Gao Wednesdy 10am - 11am, 12 PM - 1pm, First floor lobby of Mudd
Teaching Assistant: Hong-Yu Chen Monday 6-8pm, Mudd 3532
Peer Mentor: Kaitlyn Wang Monday 3-5pm, First floor lobby of Mudd
Peer Mentor: Kefan Yu Tuesday 9-11am, Mudd 3532
Peer Mentor: EJ Van de Grift Friday 1PM-3PM, Mudd 3532
Peer Mentor: Harry Wang Thursday 9-11am, First floor lobby of Mudd
This is a first course in Deep Learning. We will study deep learning architectures: perceptrons, multi-layer perceptrons, convolutional networks, recurrent neural networks (LSTMs, GRUs), attention networks, transformers, autoencoders, and the combination of reinforcement learning with deep learning. Other covered topics include regularization, loss functions and gradient descent.
Learning will be in the practical context of implementing networks using these architectures in a modern programming environment: Pytorch. Homework consists of a mixture of programming assignments, review of research papers, running experiments with deep learning architectures, and theoretical questions about deep learning.
Students completing this course should be able to reason about deep network architectures, build a deep network from scratch in Python, modify existing deep networks, train networks, and evaluate their performance. Students completing the course should also be able to understand current research in deep networks.
This course presumes prior knowledge of machine learning equivalent to having taken CS 349 Machine Learning.
The primary text is the Deep Learning book. This reading will be supplemented by reading key papers in the field.
Please use CampusWire for class-related questions.
Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can’t finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.
You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-…and so on).
There are 2 exams (each worth 20 points) and 5 homework assignments (each worth 15 points). The homework where you got the worst grade will be dropped from grading.
Homework and reading assignments are solo assignments and must be your own original work.
Use of large language models for answer generation is not allowed.
There are 5 homework assignments. The homework where you got the worst grade will be dropped from grading. There is no other extra credit.
| Week | Day and Date | Topic (tentative) | Due today | Points |
|---|---|---|---|---|
| 1 | Thu Apr 2 | Class introduction | ||
| 2 | Tue Apr 7 | Perceptrons | ||
| 2 | - | Notebook 1: perceptrons | ||
| 2 | Thu Apr 9 | Gradient descent | ||
| 3 | Tue Apr 14 | Backpropagation of error | ||
| 3 | - | Notebook 2: MLP in Pytorch | ||
| 3 | Thu Apr 16 | Multi-layer perceptrons | ||
| 4 | Tue Apr 21 | Convolutional nets | Homework 1 | 15 |
| 4 | - | Notebook 3: Image Classification | ||
| 4 | Thu Apr 23 | regularization | ||
| 5 | Tue Apr 28 | Data augmentation & generalization | ||
| 5 | - | Notebook 4: CNNs & Logging | ||
| 5 | Thu Apr 30 | Adversarial examples | Homework 2 | 15 |
| 5 | - | Notebook 5: adversarial examples | ||
| 6 | Tue May 5 | Generative adversarial networks (GANS) | ||
| 6 | - | Notebook 6: GANs | ||
| 6 | Thu May 7 | MIDTERM | Midterm | 20 |
| 7 | Tue May 12 | Unsupervised methods | ||
| 7 | - | Notebook 7: autoencoders | ||
| 7 | Thu May 14 | recurrent nets | ||
| 7 | - | LSTMs | ||
| 7 | - | Notebook 8: RNNs | ||
| 8 | Tue May 19 | Reinforcement learning (RL) | Homework 3 | 15 |
| 8 | Thu May 21 | Deep RL | ||
| 9 | Tue May 26 | Pong with Reinforcement learning (RL) | ||
| 9 | Thu May 28 | Attention networks | ||
| 10 | Tue Jun 2 | Transformers | Homework 4 | 15 |
| 10 | Thu Jun 4 | Final exam preparation | ||
| 11 | Tue Jun 9 | no-class, finals week | Make-up Homework | 15 |
| 11 | Thu Jun 11 | FINAL EXAM | Final Exam | 20 |
Anaconda is the most popular python distro for machine learning.
Pytorch Facebook’s popular deep learning package. My lab uses this.
Tensorboard is what my lab uses to visualize how experiments are going.
Tensorflow is Google’s most popular python DNN package
Keras A nice programming API that works with Tensorflow
JAX Is an alpha package from Gogle that allows differentiation of numpy and also an optimizing compiler for working on tensor processing units
Trax Is Google Brain’s DNN package. It focuses on transformers and is implemented on top of JAX
MXNET is Apache’s open source DL package.
Deep Learning is THE book on Deep Learning. One of the authors won the Turing prize due to his work on deep learning.
Dive Into Deep Learning provides example code and instruction for how to write DL models in Pytorch, Tensorflow and MXNet.
Google’s Colab offers free GPU time and a nice environment for running Jupyter notebook-style projects. For $10 per month, you also get priority access to GPUs and TPUs.
Amazon’s SageMaker offers hundres of free hours for newbies.
The CS Department Wilkinson Lab just got 22 new machines that each have a graphics card suitable for deep learning, and should be remote-accessable and running Linux with all the python packages needed for deep learning.
Chapter 4 of Machine Learning : READ THIS FIRST This is Tom Mitchell’s book. Historical overview + explanation of backprop of error. It’s a good starting point for actually understanding deep nets. Read the whole chapter.
What are Gradients, Jacobians, and Hessians?: This isn’t a book chapter, but if you don’t know what a gradient, Jacobian or Hessian is, you should read this before you read Chapter 4 of the Deep Learning book.
Chapter 4 of the Deep Learning Book: This covers basics of gradient-based optimization. Read through Section 4.3.
Chapter 6 of Deep Learning: This covers the basics from a more modern perspective. To my mind, if you’ve read Tom Mitchell, it is mostly useful for covering different kinds of activation functions. Read through Section 6.4
Chapter 7 of the Deep Learning Book: Covers regularization. The minimal useful read is sections 7.1 and 7.4…but this assumes you’ll read the papers some of the other sections are based on. Those papers are in the additional readings. If you don’t read those, then I’d add 7.9, 7.12, 7.13.
Chapter 8 of the Deep Learning Book: This covers optimization. Read through Section 8.5. Beyond that, it is stuff outside the scope of the class.
Chapter 9 of Deep Learning: Convolutional networks. Read 9.1 through 9.4 and 9.10
—— no book chapter below this line will be expected for the midterm ——
Understanding LSTMs: A simple (maybe too simple?) walk-through of LSTMs. Good to read before trying the book chapter on this topic.
Chapter 10 of Deep Learning: RNNs and LSTMS
Reinforcement Learning: An Introduction, Chapters 3 and 6: This gives you the basics of what reinforcement learning (RL) is about.
Generalization and Network Design Strategies: The original 1989 paper where LeCun describes Convolutional networks.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Explaining and Harnessing Adversarial Examples : This paper got the ball rolling by pointing out how to make images that look good but are consistently misclassified by trained deepnets.
Generative Adversarial Nets: The paper that introduced GANs.
Dropout: A Simple Way to Prevent Neural Networks from Overfitting: Explains a widely-used regularizer
—— no additional reading below this line will be expected for the midterm ——
DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks: This is an end-to-end model. Many papers build on this. The homework uses the discriminator approach from this paper
Long Term Short Term Memory: The original 1997 paper introducing the LSTM
Playing Atari with Deep Reinforcement Learning: A key paper that showed how reinforcement learning can be used with deep nets. This is discussed in class.
Deep Reinforcement Learning: Pong from Pixels: This is the blog we base part of Homework 4 on.
The Illustrated Transformer: A good walkthrough that helps a lot with understanding transformers
Attention is All You Need: The paper that introduced transformers, which are a popular and more complicated kind of attention network.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: A widely-used language model based on Transformer encoder blocks.
The Illustrated GPT-2: Not a paper, but a good overview of GPT-2 and its relation to Transformer decoder blocks.
| Top | Calendar | Links | Readings |