Top | Calendar | Links | Readings |
Tuesdays and Thursdays, 9:30am - 10:50am Central Time
2122 Sheridan Rd Classroom 250
Professor: Bryan Pardo
TAs: Hugo Flores Garcia, Patrick O’Reilly
Peer Mentors: Jerry Cao, Saumya Pailwan, Anant Poddar, Nathan Pruyne
Anant Poddar M 3pm - 5pm Mudd 3 floor front counter
Nathan Pruyne M 5pm - 7pm Mudd 3207
Bryan Pardo TU 11am - noon Mudd 3115
Saumya Pailwan W 2pm - 4pm Mudd 3 floor front counter
Jerry Cao W 5pm - 7pm Mudd 3 floor front counter
Patrick O’Reilly TH 12pm - 1pm, 2pm-3pm Mudd 3207
Hugo Flores Garcia TH 2pm - 4pm Mudd 3207
This is a first course in Deep Learning. We will study deep learning architectures: perceptrons, multi-layer perceptrons, convolutional networks, recurrent neural networks (LSTMs, GRUs), attention networks, transformers, autoencoders, and the combination of reinforcement learning with deep learning. Other covered topics include regularization, loss functions and gradient descent.
Learning will be in the practical context of implementing networks using these architectures in a modern programming environment: Pytorch. Homework consists of a mixture of programming assignments, review of research papers, running experiments with deep learning architectures, and theoretical questions about deep learning.
Students completing this course should be able to reason about deep network architectures, build a deep network from scratch in Python, modify existing deep networks, train networks, and evaluate their performance. Students completing the course should also be able to understand current research in deep networks.
This course presumes prior knowledge of machine learning equivalent to having taken CS 349 Machine Learning.
The primary text is the Deep Learning book. This reading will be supplemented by reading key papers in the field.
Please use CampusWire for class-related questions.
Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can’t finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.
You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-…and so on).
Homework and reading assignments are solo assignments and must be your own original work. Use of large language models for answer generation is not allowed.
There is an extra credit assignment worth 10 points. More details soon.
Anaconda is the most popular python distro for machine learning.
Pytorch Facebook’s popular deep learning package. My lab uses this.
Tensorboard is what my lab uses to visualize how experiments are going.
Tensorflow is Google’s most popular python DNN package
Keras A nice programming API that works with Tensorflow
JAX Is an alpha package from Gogle that allows differentiation of numpy and also an optimizing compiler for working on tensor processing units
Trax Is Google Brain’s DNN package. It focuses on transformers and is implemented on top of JAX
MXNET is Apache’s open source DL package.
Deep Learning is THE book on Deep Learning. One of the authors won the Turing prize due to his work on deep learning.
Dive Into Deep Learning provides example code and instruction for how to write DL models in Pytorch, Tensorflow and MXNet.
Google’s Colab offers free GPU time and a nice environment for running Jupyter notebook-style projects. For $10 per month, you also get priority access to GPUs and TPUs.
Amazon’s SageMaker offers hundres of free hours for newbies.
The CS Department Wilkinson Lab just got 22 new machines that each have a graphics card suitable for deep learning, and should be remote-accessable and running Linux with all the python packages needed for deep learning.
Chapter 4 of Machine Learning : READ THIS FIRST This is Tom Mitchell’s book. Historical overview + explanation of backprop of error. It’s a good starting point for actually understanding deep nets.
What are Gradients Jacobians and Hessians?. This isn’t a book chapter, but if you don’t know what a gradient, Jacobian or Hessian is, you should read this before you read Chapter 4 of the Deep Learning book.
Chapter 4 of the Deep Learning Book: This covers basics of gradient-based optimization.
Chapter 6 of Deep Learning: This covers the basics from a more modern perspective. To my mind, if you’ve read Tom Mitchell, it is mostly useful for covering different kinds of activation functions.
Chapter 7 of the Deep Learning Book: Covers regularization.
Chapter 8 of the Deep Learning Book: This covers optimization.
Chapter 9 of Deep Learning: Convolutional networks.
Understanding LSTMs: A simple (maybe too simple?) walk-through of LSTMs. Good to read before trying the book chapter on this topic.
Chapter 10 of Deep Learning: RNNs and LSTMS
Reinforcement Learning: An Introduction, Chapters 3 and 6: This gives you the basics of what reinforcement learning (RL) is about.
Generalization and Network Design Strategies: The original 1989 paper where LeCun describes Convolutional networks.
Explaining and Harnessing Adversarial Examples : This paper got the ball rolling by pointing out how to make images that look good but are consistently misclassified by trained deepnets.
Generative Adversarial Nets: The paper that introduced GANs.
Dropout: A Simple Way to Prevent Neural Networks from Overfitting: Explains a widely-used regularizer
DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks: This is an end-to-end model. Many papers build on this. The homework uses the discriminator approach from this paper
Long Term Short Term Memory: The original 1997 paper introducing the LSTM
Playing Atari with Deep Reinforcement Learning: A key paper that showed how reinforcement learning can be used with deep nets. This is discussed in class.
Deep Reinforcement Learning: Pong from Pixels: This is the blog we base part of Homework 4 on.
The Illustrated Transformer: A good walkthrough that helps a lot with understanding transformers
Attention is All You Need: The paper that introduced transformers, which are a popular and more complicated kind of attention network.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: A widely-used language model based on Transformer encoder blocks.
The Illustrated GPT-2: Not a paper, but a good overview of GPT-2 and its relation to Transformer decoder blocks.
Top | Calendar | Links | Readings |