Course materials for the Interactive Audio Lab

In this course students will study deep learning architectures such as restricted Boltzmann machines, deep neural networks, convolutional deep neural networks, deep belief networks and recurrent neural networks (LSTMs, GRUs). They will read original research papers that describe the algorithms and how they have been applied to fields like computer vision, machine translation, automatic speech recognition, and audio event recognition.

Week | Date | Topic | Deliverable | Points |
---|---|---|---|---|

Every week | Class Participation | 2 per week * 10 weeks = 20 | ||

Once | Present a topic area | 20 | ||

1 | March 29 | Perceptrons, MLPs | ||

2 | April 5 | RBMs, Deep Belief Networks | ||

3 | April 12 | Regularization & Optimization | ||

4 | April 19 | Convolutional Networks | ||

5 | April 26 | Image Net and Descendents | 10 paper reviews | 20 |

6 | May 3 | Deep and Adversarial Networks | ||

7 | May 10 | Recurrent Networks | ||

8 | May 17 | Long Short Term Memory | ||

9 | May 24 | Reinforcement Learning | 10 paper reviews | 20 |

10 | May 31 | CTC and Highway LSTM | ||

11 | June 7 | Final Project | 20 |

Anaconda: The most popular python distro for machine learning

Scikit Learn: the most popular machine learning python package

Tensorflow: the most popular python DNN package

Keras: A nice python API for Tensorflow

My guide to installing Keras and Tensorflow on MacOS

Susho says Pytorch is the way to go for RNNs

Chapter 1 of Parallel Distributed Processing

Blog: A hacker’s guide to neural nets

Chapter 16 and 20 of Deep Learning Book (just the bits on RBMs)

Scaling Learning Algorithms towards AI

Chapters 7 and 8 of the Deep Learning Book

Kezhen:Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

A blog describing how to easily implement Self-normalizing networks

Chapter 9 of Deep Learning: Convolutional Networks

Joe: Generalization and Network Design Strategies

Kezhen: Convolutional Networks for Images, Speech, and Time-Series

Bryan: Network In Network

Going Deeper with Convolutions

Wei: ImageNet Classification with Deep Convolutional Neural Networks

Yiming: What I learned from Competing against Convnet on Imagenet

Blair: Visualizing and Understanding Convolutional Networks

A guide to convolution arithmetic for deep learning

Deconvolution and Checkerboard Artifacts

Bryan: Rich feature hierarchies for accurate object detection and semantic segmentation

Selective Search for Object Recognition

Wei: Very Deep Convolutional Networks for Large-Scale Image Recognition

Huayi: Deep Residual Learning for Image Recognition

Fast R-CNN (Deep ResNets uses this)

Joe: Explaining and Harnessing Adversarial Examples

Sushobhan: Generative Adversarial Nets

Julia Evans’ walkthrough of generating adversarial examples

Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

A tutorial on Back Propagation Through Time

On the Difficulties of Training Recurrent Networks

Ben: Long Term Short Term Memory

Deep Visual-Semantic Alignments for Generating Image Descriptions

Bidirectional RNNs: NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

Experiments in Handwriting with a LSTM Neural Network

CJ: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Yiming: Learning to Forget: Continual Prediction with LSTM

CJ: Sequence to Sequence Learning with Neural Networks

The NIPS talk on Sequence to Sequence Learning

Reinforcement Learning: An Introduction, Chapters 3 and 6

My lecture notes on reinforcement learning (easier intro)

Sushobhan: Attention and Augmented Recurrent Neural Networks

Max: Playing Atari with Deep Reinforcement Learning

Huayi: Mastering the game of Go with deep neural networks and tree search

Highway Networks (useful for understanding Highway LSTMs)

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

DeepMath - Deep Sequence Models for Premise Selection

A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs

Do Deep Nets Really Need to be Deep?

Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency

Quasi Recurrent Neural Networks

Cross Modal Distillation for Supervision Transfer

Tutorial on Variational Autoencoders

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Describing Videos by Exploiting Temporal Structure

The Illustrated Transformer (This is a tutorial that explains the concepts in “Attention is All You Need”)

Deep clustering: Discriminative embeddings for segmentation and separation

DEEP ATTRACTOR NETWORK FOR SINGLE-MICROPHONE SPEAKER SEPARATION

SoundNet: Learning Sound Representations from Unlabeled Video

UNSUPERVISED LEARNING OF SEMANTIC AUDIO REPRESENTATIONS

DEEP RANKING: TRIPLET MATCHNET FOR MUSIC METRIC LEARNING

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Deep Cross-Modal Audio-Visual Generation

Building End-to-End Dialogue Systems Using Generative Hierarchical Neural Network Models

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition

Slides explaining what an iVector is

Analysis of I-vector Length Normalization in Speaker Recognition Systems

Highway Networks (useful for understanding Highway LSTMs)

Highway Long Short-Term Memory RNNs for Distant Recognition

WaveNet: A Generative Model for Raw Audio

Deep Voice: Real-time Neural Text-to-Speech

SampleRNN: Bengio’s take on generating speech

Tacotron is the speech generation system used at Google

Tacotron 2 is the 2018 speech generation system from Google

Pixel Recurrent Neural Networks (WaveNet is based on this)

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Conditional Image Generation with PixelCNN Decoders

Learning Confidence for Out-of-Distribution Detection in Neural Networks

TRAINING CONFIDENCE-CALIBRATED CLASSIFIERS FOR DETECTING OUT-OF-DISTRIBUTION SAMPLES

The Reddit discussion on the current state of the art in estimating confidence

Understanding deep learning requires rethinking generalization