Top | Calendar | Links | Readings |
M W F, 10:00am - 10:50am Central Time
Mondays: Swift Hall 107 Wednesdays: Swift Hall 107 Fridays: online python notebook walk-throughs
Bryan Pardo Office hours Thursdays 9:30am - 11:00am Zoom link: The default class zoom (find it on canvas)
Patrick O’Reilly Office hours Wednesday 2:00-3:00pm, Saturday 2:00-3:00pm Zoom link: The default class zoom (find it on canvas)
Aldo Aguilar: Office hours Fridays 1:00pm - 3:00pm Mudd 3534
Andreas Bugler: Office hours Mondays and Wednesdays 1:00pm - 2:00pm Mudd 3534
Noah Schaffer: Office hours Tuesdays 9:00am - 11:00am Mudd 3532
This is a first course in Deep Learning. We will study deep learning architectures: perceptrons, multi-layer perceptrons, convolutional networks, recurrent neural networks (LSTMs, GRUs), attention networks, transformers, autoencoders, and the combination of reinforcement learning with deep learning. Other covered topics include regularization, loss functions and gradient descent.
Learning will be in the practical context of implementing networks using these architectures in a modern programming environment: Pytorch. Homework consists of a mixture of programming assignments, review of research papers, running experiments with deep learning architectures, and theoretical questions about deep learning.
Students completing this course should be able to reason about deep network architectures, build a deep network from scratch in Python, modify existing deep networks, train networks, and evaluate their performance. Students completing the course should also be able to understand current research in deep networks.
This course presumes prior knowledge of machine learning equivalent to having taken CS 349 Machine Learning.
The primary text is the Deep Learning book. This reading will be supplemented by reading key papers in the field.
Please use CampusWire for class-related questions.
Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can’t finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.
You can earn up to 110 points. You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-…and so on).
Homework and reading assignments are solo assignments and must be original work.
Every student gets 3 points. For doing nothing. Therefore, when you look at a homework you think was graded a little low…remember, we already gave you 3 points. That likely makes up for the small thing you wish were graded differently. So maybe let it go? We already gave you the points for that, after all.
Students can receive up to 9 points (nearly a full letter grade) of extra credit by submitting reviews of three research papers selected from the course reading list. No additional extra credit beyond this will be provided. No requests for extra-extra credit will be considered.
Week | Day and Date | Topic (tentative) | Due today | Points |
---|---|---|---|---|
1 | Tue Mar 29 | Class intro | ||
1 | Wed Mar 30 | Perceptrons | ||
1 | Fri Apr 01 | Notebook 1: perceptrons | ||
2 | Mon Apr 04 | Gradient descent | Reading 1 | 9 |
2 | Wed Apr 06 | Backpropagation of error | ||
2 | Fri Apr 08 | Notebook 2: MLP in Pytorch | ||
3 | Mon Apr 11 | Multi-layer perceptrons | Homework 1 | 11 |
3 | Wed Apr 13 | Convolutional nets | ||
3 | Fri Apr 15 | Notebook 3: Image Classification | ||
4 | Mon Apr 18 | regularization | Reading 2 | 9 |
4 | Wed Apr 20 | Data augmentation & generalization | ||
4 | Fri Apr 22 | Notebook 4: CNNs & Logging | ||
5 | Mon Apr 25 | Visual adversarial examples | Homework 2 | 11 |
5 | Wed Apr 27 | Auditory adversarial examples | ||
5 | Fri Apr 29 | Notebook 5: adversarial examples | ||
6 | Mon May 02 | Generative adversarial networks (GANS) | Reading 3 | 9 |
6 | Wed May 04 | More GANS | ||
6 | Fri May 06 | Notebook 6: GANs | ||
7 | Mon May 09 | Unsupervised methods | ||
7 | Wed May 11 | recurrent nets | Homework 3 | 11 |
7 | Fri May 13 | Notebook 7: autoencoders | ||
8 | Mon May 16 | LSTMs | ||
8 | Wed May 18 | Attention networks | Reading 4 | 9 |
8 | Fri May 20 | Transformers | ||
9 | Mon May 23 | Notebook 8: RNNs | ||
9 | Wed May 25 | Reinforcement learning (RL) | Reading 5 | 9 |
9 | Fri May 27 | Deep RL | ||
10 | Mon May 30 | NO CLASS, MEMORIAL DAY | ||
10 | Wed Jun 01 | Current research in DL | Homework 4 | 11 |
10 | Fri Jun 03 | Current research in DL | ||
11 | Wed Jun 08 | No final exam, just final reading | Reading 6 | 9 |
11 | Fri Jun 10 | Optional extra credit reading | Extra Credit | 9 |
Anaconda is the most popular python distro for machine learning.
Pytorch Facebook’s popular deep learning package. My lab uses this. Tensorboard is what my lab uses to visualize how experiments are going.
Tensorflow is Google’s most popular python DNN package
Keras A nice programming API that works with Tensorflow
JAX Is an alpha package from Gogle that allows differentiation of numpy and also an optimizing compiler for working on tensor processing units
Trax Is Google Brain’s DNN package. It focuses on transformers and is implemented on top of JAX
MXNET is Apache’s open source DL package.
Deep Learning is THE book on Deep Learning. One of the authors won the Turing prize due to his work on deep learning.
Dive Into Deep Learning provides example code and instruction for how to write DL models in Pytorch, Tensorflow and MXNet.
Google’s Colab offers free GPU time and a nice environment for running Jupyter notebook-style projects. For $10 per month, you also get priority access to GPUs and TPUs.
Amazon’s SageMaker offers hundres of free hours for newbies.
The CS Department Wilkinson Lab just got 22 new machines that each have a graphics card suitable for deep learning, and should be remote-accessable and running Linux with all the python packages needed for deep learning.
The Organization of Behavior: Hebb’s 1949 book that provides a general framework for relating behavior to synaptic organization through the dynamics of neural networks.
The Perceptron: This is the 1st neural networks paper, published in 1958. The algorithm won’t be obvious, but the thinking is interesting and the conclusions are worth reading.
The Perceptron: A perceiving and recognizing automoton: This one is an earlier paper by Rosenblatt that is, perhaps, even more historical than the 1958 paper and a bit easer for an engineer to follow, I think.
* Chapter 4 of Machine Learning : This is Tom Mitchell’s book. Historical overview + explanation of backprop of error. It’s a good starting point for actually understanding deep nets. START HERE. IT’S WORTH 2 READINGS. WHAT THAT MEANS IS…GIVE ME 2 PAGES OF REACTIONS FOR THIS READING AND GET CREDIT FOR 2 READINGS
Chapter 6 of Deep Learning: Modern intro on deep nets. To me, this is harder to follow than Chapter 4 of Machine Learning, though. Certainly, it’s longer.
This reading is NOT worth points, but……if you don’t know what a gradient, Jacobian or Hessian is, you should read this before you read Chapter 4 of the Deep Learning book.
Chapter 4 of the Deep Learning Book: This covers basics of gradient-based optimization. Start here for optimization
Chapter 8 of the Deep Learning Book: This covers optimization. This should come 2nd in your optimization reading
Why Momentum Really Works: Reading this will help you understand the popular ADAM optimizer better.
On the Difficulties of Training Recurrent Networks: A 2013 paper that explains vanishing and exploding gradients
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. This is the most common approaches to normalization.
AutoClip: Adaptive Gradient Clipping for Source Separation Networks is a recent paper out of Pardo’s lab that helps deal with unruly gradients. There’s also a video for this one.
Generalization and Network Design Strategies: The original 1989 paper where LeCun describes Convolutional networks. Start here.
Chapter 7 of the Deep Learning Book: Covers regularization.
Dropout: A Simple Way to Prevent Neural Networks from Overfitting: Explains a widely-used regularizer
Understanding deep learning requires rethinking generalization: Thinks about the question “why aren’t deep nets overfitting even more than they seem to be”?
The Implicit Bias of Gradient Descent on Separable Data : A study of bias that is actually based on the algorithm, rather than the dataset.
Visualizing and Understanding Convolutional Networks: How do you see what the net is thinking? Here’s one way.
Local Interpretable Model-Agnostic Explanations (LIME): An Introduction A technique to explain the predictions of any machine learning classifier.
If you already understand what convolutional networks are, then here are some populare architectures you can find out about.
Deep Residual Learning for Image Recognition: The 2016 paper that introduces the popular ResNet architecture that can get 100 layers deep
Very Deep Convolutional Networks for Large-Scale Image Recognition: The 2015 paper introducing the popular VGG architecture
Going Deeper with Convolutions:The 2015 paper describing the Inception network architecture.
Explaining and Harnessing Adversarial Examples : This paper got the ball rolling by pointing out how to make images that look good but are consistently misclassified by trained deepnets.
Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images: This paper shows just how screwy you can make an image and still have it misclsasified by a “well trained, highly accurate” image recognition deep net.
Effective and Inconspicuous Over-the-air Adversarial Examples with Adaptive Filtering: Cutting edge research from our very own Patrick O.
Generative Adversarial Nets: The paper that introduced GANs. If you read only one GAN paper, make it this one.
2016 Tutorial on Generative Adversarial Networks by one of the creators of the GAN. This one’s long, but good.
DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks: This is an end-to-end model. Many papers build on this. The homework uses the discriminator approach from this paper
Generative Adversarial Text to Image Synthesis This paper describes generating images conditioned on text descriptions. Pretty interesting…
Chapter 10 of Deep Learning: A decent starting point
The Recurrent Neural Networks Tutorial: This is a 4-part tutorial that starts with an overview and then gets deep into coding up an RNN using Theano (not PyTorch) and has links to GitHub repositories with all the examples. If you just read this for the points, read Part 1. But go deep, if you’re interested, and read all the parts. NOTE the links to the code repositories work. Many of the other hyperlinks don’t.
* Extensions of recurrent neural network language model: This covers the RNN language model discussed in class.
Long Term Short Term Memory: The original 1997 paper introducing the LSTM
Understanding LSTMs: A simple (maybe too simple?) walk-through of LSTMs
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling: Compares a simplified LSTM (the GRU) to the original LSTM and also simple RNN units.
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) ** This is a good starting point on attention models. **
Sequence to Sequence Learning with Neural Networks: This is the paper that the link above was trying to explain.
* Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation: This paper introduces encoder-decoder networks for translation. Attention models were first built on this framework. Covered in class.
* Neural Machine Translation by Jointly Learning to Align and Translate: This paper introduces additive attention to an encoder-decoder. Covered in class.
* Effective Approaches to Attention-based Neural Machine Translation: Introduced multiplicative attention. Covered in class.
Massive Exploration of Neural Machine Translation Architectures: A 2017 paper that settles the questions about which architecture is best for doing translation….except that the Transformer model came out that same year and upended everything. Still, a good overview of the pre-transformer state-of-the-art.
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention: Attention started with text, but is now applied to images. Here’s an example.
Listen, Attend and Spell: Attention is also applied to speech, as per this example.
A Tutorial in TensorFlow: Ths walks through how to use Tensorflow 1.X to build a neural machine translation network with attention.
The Illustrated Transformer: A good walkthrough that helps a lot with understanding transformers ** I’d start with this one to learn about transformers.**
The Annotated Transformer: An annotated walk-through of the “Attention is All You Need” paper, complete with detailed python implementation of a transformer.
Attention is All You Need: The paper that introduced transformers, which are a popular and more complicated kind of attention network.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: A widely-used language model based on Transformer encoder blocks.
The Illustrated GPT-2: A good overview of GPT-2 and its relation to Transformer decoder blocks.
Reinforcement Learning: An Introduction, Chapters 3 and 6: This gives you the basics of what reinforcement learning (RL) is about.
Playing Atari with Deep Reinforcement Learning: A key paper that showed how reinforcement learning can be used with deep nets.
Mastering the game of Go with deep neural networks and tree search: A famous paper that showed how RL + Deepnets = the best Go player in existence at the time.
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play: This is the AlphaZero paper. AlphaZero is the best go player…and a great chess player.
Top | Calendar | Links | Readings |