# Notebook 4: Data Augmentation and Logging

In this notebook, we'll expand our training loop for image classification to include __data augmentation__. We'll also use PyTorch's built-in __logging__ tools to monitor our network's progress as it trains.

The notebook is broken up as follows:

  1. [Setup](#setup)  
  2. [Neural Networks for Image Recognition](#review)
  3. [Data Augmentation](#augmentation)  
  4. [Logging](#logging)  

## __1.__ <a name="setup">Setup</a>


Make sure the needed packages are installed and utility code is in the right place.

In [None]:
# helper code from the course repository
!git clone https://github.com/interactiveaudiolab/course-deep-learning.git
# install common pacakges used for deep learning
!cd course-deep-learning/ && pip install -r requirements.txt

In [None]:
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.datasets as datasets
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline
%cd course-deep-learning/

## __2.__ <a name="review">Neural Networks for Image Recognition</a>

In the previous notebook, we designed and trained a neural network to perform digit recognition on the MNIST dataset. In this notebook, we'll also consider a __convolutional neural network__ for the same task. Recall that convolutional networks use weight __kernels__ to capture correlations between neighboring coordinates. We can wrap the application of these kernels into a "layer" in the same way we do for weight-input dot products in a multilayer perceptron.

In PyTorch, we can define a two-dimensional convolutional layer as follows:

```
conv_layer = nn.Conv2d(
  in_channels,
  out_channels,
  kernel_size,
  stride
)
```
Some things to keep in mind:
* `in_channels` refers to the number of channels in the input. In our case, because MNIST images are grayscale (1 channel), this value will be 1 for our first layer. 
* `kernel_size` can be either a tuple specifying `(kernel_height, kernel_width)` or an integer, in which case both the kernel height and width will be set to this value. Each kernel in the layer will have dimension `(in_channels, kernel_height, kernel_width)`, and will produce a single-channel feature map when applied to the input. Thus, `out_channels` refers to both the number of channels (feature maps) in the output and the number of convolutional kernels applied in the layer. 
* `stride` refers to the hop size when applying kernels, and can be either a tuple (specifying vertical and horizontal hop sizes) or an integer (in which case the same value will be used for both). 
* For an overview of more options, see the official [PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html).

In addition to convolution, we'll experiment with two additional types of layers:
* __Dropout__ randomly zeros elements of an input tensor with a given probability, ensuring that the network learns more robust and general features. In order to apply dropout at training time but _not_ at inference time, we can call `.train()` and `.eval()` on our network as usual; these will automatically set the behavior of any dropout layers in the model. For more details, see the [PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html).
* __Max-Pooling__ can be thought of as a convolutional layer with `out_channels=in_channels`, but with the kernel dot-product operation replaced by a maximum. This can be used to "pool" or compress the spatial (height/width) dimensions of tensors as they pass through the network. For more details, see the [PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html).

#### Model Definitions

In [None]:
class LinearNetwork(nn.Module):

  def __init__(self):
    """The multi-layer perceptron from our previous notebook"""
    super().__init__()

    # MNIST images are (1, 28, 28) (channels, width, height)
    self.layer_1 = nn.Linear(28*28, 1024)
    self.layer_2 = nn.Linear(1024, 10)
    self.relu = nn.ReLU()

  def forward(self, x):

    batch_size, channels, width, height = x.size()
    x = x.view(batch_size, -1)  # create an array of flattened images with dimension (batch_size, num_pixels)

    # this time, we'll use the ReLU nonlinearity at each layer  
    x = self.relu(self.layer_1(x))
    x = self.layer_2(x)  # we'll avoid "squashing" our final outputs by omitting the sigmoid

    return x


class ConvNetwork(nn.Module):
    """
    A simple convolutional neural network for image classification.
    From https://github.com/pytorch/examples/blob/master/mnist/main.py
    """

    def __init__(self):
      super().__init__()

      # convolutional layers
      self.conv1 = nn.Conv2d(1, 32, 3, 1)
      self.conv2 = nn.Conv2d(32, 64, 3, 1)

      # just like in our fully-connected network, we'll use ReLU activations
      self.relu = nn.ReLU()

      # random dropout with two different "strengths"
      self.dropout1 = nn.Dropout(0.25)  # we pass the dropout probability
      self.dropout2 = nn.Dropout(0.5)

      # max-pooling
      self.pool = nn.MaxPool2d(4)

      # a final fully-connected network to map our learned convolutional
      # features to class predictions
      self.fc1 = nn.Linear(64*6*6, 128)
      self.fc2 = nn.Linear(128, 10)

    def forward(self, x):

      # inputs are expected to have shape (batch_size, 1, 28, 28)
      x = self.conv1(x)
      x = self.relu(x)

      # out first convolutional layer reshapes inputs to (batch_size, 32, 26, 26)
      x = self.conv2(x)
      x = self.relu(x)

      # our second convolutional layer reshapes inputs to (batch_size, 64, 24, 24)
      x = self.pool(x)
      x = self.dropout1(x)

      # our pooling layer reduces inputs to (batch_size, 64, 6, 6)
      x = torch.flatten(x, 1)

      # we "flatten" inputs to (batch_size, 64 * 6 * 6) before passing to a 
      # small fully-connected network
      x = self.fc1(x)
      x = self.relu(x)
      x = self.dropout2(x)
      x = self.fc2(x)

      # our final outputs are vectors of class scores, with shape (batch_size, 10)
      return x


def param_count(m: nn.Module):
  """Count the number of trainable parameters (weights) in a model"""
  return sum([p.shape.numel() for p in m.parameters() if p.requires_grad])


model1 = LinearNetwork()
model2 = ConvNetwork()

params1 = param_count(model1)
params2 = param_count(model2)

print(f'Parameters in fully-connected network: {params1}')
print(f'Parameters in convolutional network: {params2}')
print(f'The convolutional network has {params2/params1 :0.2f}x as many parameters')

#### Training Loop

Next, we'll slightly modify our training loop to allow for different models.

In [None]:
def training_loop(save_path, epochs, batch_size, device="cpu", use_conv=False):
    """
    Train a neural network model for digit recognition on the MNIST dataset.
    
    Parameters
    ----------
    save_path (str):  path/filename for model checkpoint, e.g. 'my_model.pt'
    
    epochs (int):     number of iterations through the whole dataset for training
    
    batch_size (int): size of a single batch of inputs
    
    device (str):     device on which tensors are placed; should be 'cpu' or 'cuda'. 

    use_conv (bool):  if True, use ConvNetwork; else, use LinearNetwork.
    
    Returns
    -------
    model (nn.Module): final trained model
    
    save_path (str):   path/filename for model checkpoint, so that we can load our model
                       later to test on unseen data
    
    device (str):      the device on which we carried out training, so we can match it
                       when we test the final model on unseen data later
    """

    # initialize model
    if use_conv:
      model = ConvNetwork()
      print('Training convolutional neural network...')
    else:
      model = LinearNetwork()
      print('Training fully-connected neural network...')

    print(f'Parameters in model: {param_count(model)}')
    model.to(device)

    # initialize an optimizer to update our model's parameters during training
    optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
    # optimizer = torch.optim.Adadelta(model.parameters(), lr=1.0)

    # make a new directory in which to download the MNIST dataset
    data_dir = "./data/"
    
    # initialize a Transform object to prepare our data
    transform = torchvision.transforms.Compose([
        torchvision.transforms.ToTensor(),
        lambda x: x>0,
        lambda x: x.float(),
    ])

    # load MNIST "test" dataset from disk
    mnist_test = datasets.MNIST(data_dir, train=False, download=True, transform=transform)

    # load MNIST "train" dataset from disk and set aside a portion for validation
    mnist_train_full = datasets.MNIST(data_dir, train=True, download=True, transform=transform)
    mnist_train, mnist_val = torch.utils.data.random_split(mnist_train_full, [55000, 5000])

    # initialize a DataLoader object for each dataset
    train_dataloader = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)
    val_dataloader = torch.utils.data.DataLoader(mnist_val, batch_size=batch_size, shuffle=False)
    test_dataloader = torch.utils.data.DataLoader(mnist_test, batch_size=1, shuffle=False)

    # a PyTorch categorical cross-entropy loss object
    loss_fn = torch.nn.CrossEntropyLoss()

    # time training process
    st = time.time()

    # keep track of best validation accuracy; if improved upon, save checkpoint
    best_acc = 0.0

    # time to start training!
    for epoch_idx, epoch in enumerate(range(epochs)):

        # loop through the entire dataset once per epoch
        train_loss = 0.0
        train_acc = 0.0
        train_total = 0
        model.train()
        for batch_idx, batch in enumerate(train_dataloader):

            # clear gradients
            optimizer.zero_grad()

            # unpack data and labels
            x, y = batch
            x = x.to(device)  # we'll cover this in the next section!
            y = y.to(device)  # we'll cover this in the next section!

            # generate predictions and compute loss
            output = model(x)  # (batch_size, 10)
            loss = loss_fn(output, y)

            # compute accuracy
            preds = output.argmax(dim=1)
            acc = preds.eq(y).sum().item()/len(y)

            # compute gradients and update model parameters
            loss.backward()
            optimizer.step()

            # update statistics
            train_loss += (loss * len(x))
            train_acc += (acc * len(x))
            train_total += len(x)

        train_loss /= train_total
        train_acc /= train_total

        # perform validation once per epoch
        val_loss = 0.0
        val_acc = 0.0
        val_total = 0
        model.eval()
        for batch_idx, batch in enumerate(val_dataloader):

            # don't compute gradients during validation
            with torch.no_grad():

                # unpack data and labels
                x, y = batch
                x = x.to(device)  # we'll cover this in the next section!
                y = y.to(device)  # we'll cover this in the next section!

                # generate predictions and compute loss
                output = model(x)
                loss = loss_fn(output, y)

                # compute accuracy
                preds = output.argmax(dim=1)
                acc = preds.eq(y).sum().item()/len(y)

                # update statistics
                val_loss += (loss * len(x))
                val_acc += (acc * len(x))
                val_total += len(x)

        val_loss /= val_total
        val_acc /= val_total
        print(f"Epoch {epoch_idx + 1}: val loss {val_loss :0.3f}, val acc {val_acc :0.3f}, train loss {train_loss :0.3f}, train acc {train_acc :0.3f}")

        if val_acc > best_acc:
            print(f"New best accuracy {val_acc : 0.3f} (old {best_acc : 0.3f}); saving model weights to {save_path}")
            best_acc = val_acc
            torch.save(model.state_dict(), save_path)

    print(f"Total training time (s): {time.time() - st :0.3f}")
    
    return model, save_path, device


#### Run It!

Finally, we can compare our convolutional and fully-connected models.

In [None]:
# train a convolutional neural network
conv_model, conv_path, device = training_loop(
    save_path="mnist_cnn.pt", 
    epochs=20, 
    batch_size=60, 
    device="cuda" if torch.cuda.is_available() else "cpu",
    use_conv=True
)

# train a fully-connected neural network
lin_model, lin_path, device = training_loop(
    save_path="mnist_fc.pt", 
    epochs=20, 
    batch_size=60, 
    device="cuda" if torch.cuda.is_available() else "cpu",
    use_conv=False
)

Our convolutional network is able to achieve a classification accuracy __~4%__ higher than our fully-connected network, with less than half the parameters!

## __3.__ <a name="augmentation">Data Augmentation</a>

We've got a pretty accurate model, but there are plenty of deep learning tricks we can use to squeeze some extra performance. One common practice is __data augmentation__, in which random transformations are applied to inputs during training. This helps in two ways:
* Often, datasets are relatively small and imperfectly represent the popluation from which they are sampled. Data augmentation effectively expands the size of the dataset through sampling additional randomized variations of each instance.
* We typically want to train a model that is __robust__ against common real-world transformations of its inputs -- that is, a model whose predictions are __invariant__ under these transformations. Data augmentation exposes our model to a chosen set of transformations during training so that it can learn to "see past" them.

TorchVision provides a number of `Transform` objects designed to perform data augmentation, making it easy to apply transformations automatically when data is fetched from a `Dataset` object.

In [None]:
# directory for MNIST dataset
data_dir = "./data/"

# initialize a Transform object to prepare our data
transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    lambda x: x>0,
    lambda x: x.float(),
])

# load MNIST "train" dataset from disk
mnist_train = datasets.MNIST(data_dir, train=False, download=True, transform=transform)

# fetch an image from the MNIST dataset
example_img, example_label = mnist_train[300]
plt.imshow(example_img.squeeze(), cmap='gray')
plt.show()

# perform a random affine transformation of an input (rotation, translation, shear)
affine_aug = torchvision.transforms.RandomAffine(degrees=(-30, 30), translate=(0.25, 0.25), shear=(-45, 45))
augmented = affine_aug(example_img)
plt.imshow(augmented.squeeze(), cmap='gray')
plt.show()

Because we're effectively increasing the size of the dataset, and due to the computation required to perform each transformation, training with data augmentation may take more time (as measured in both walltime and iterations). It's also worth noting that augmentations are typically applied to the training data only. While we won't go into detail at the moment, feel free to try training with any of the [augmentations offered by TorchVision](https://pytorch.org/vision/stable/transforms.html). You can add augmentations to the training loop above by editing the `transfom` object:

```
# initialize a Transform object to prepare our data
transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    lambda x: x>0,
    lambda x: x.float(),
    torchvision.transforms.RandomAffine(degrees=(-30, 30), translate=(0.25, 0.25), shear=(-45, 45))  # just append transforms!
])
```

## __4.__ <a name="logging">Logging</a>

In our training loop, we print running summaries of our model's training performance in order to monitor its progress. This is somewhat clunky and limited - what if we want to plot accuracy in real time, visualize challenging instances, dynamically change what information is displayed, or document and compare across multiple training runs? All these tasks fall under the umbrella of __logging__, and once again, PyTorch provides utilities to simplify the process. We can use PyTorch's built-in TensorBoard support to configure and view training logs without the need for any external database or visualization software. To launch TensorBoard within the notebook, run the cell below:

In [None]:
# here, we'll initialize TensorBoard. You should see an empty window in this cell, which will populate with
# graphs as soon as we run our training code below.
%load_ext tensorboard
%tensorboard --logdir logs

Next, we'll re-write out training loop to log loss and accuracy values to TensorBoard rather than printing.

In [None]:
import datetime
from pathlib import Path
from torch.utils.tensorboard import SummaryWriter

# save all log data to a local directory
run_dir = "logs"

# timestamp the logs for each run so we can sort through them 
run_time = datetime.datetime.now().strftime("%I:%M%p on %B %d, %Y")

# initialize a SummaryWriter object to handle all logging actions
logger = SummaryWriter(log_dir=Path(run_dir) / run_time)

def training_loop(save_path, 
                  epochs, 
                  batch_size, 
                  device="cpu", 
                  use_conv=False,
                  logger=None
                  ):
    """
    Train a neural network model for digit recognition on the MNIST dataset.
    
    Parameters
    ----------
    save_path (str):        path/filename for model checkpoint, e.g. 'my_model.pt'
    
    epochs (int):           number of iterations through the whole dataset for training
    
    batch_size (int):       size of a single batch of inputs
    
    device (str):           device on which tensors are placed; should be 'cpu' or 'cuda'. 

    use_conv (bool):        if True, use ConvNetwork; else, use LinearNetwork.

    logger (SummaryWriter): a TensorBoard logger
    
    Returns
    -------
    model (nn.Module): final trained model
    
    save_path (str):   path/filename for model checkpoint, so that we can load our model
                       later to test on unseen data
    
    device (str):      the device on which we carried out training, so we can match it
                       when we test the final model on unseen data later
    """

    # initialize model
    if use_conv:
      model = ConvNetwork()
      print('Training convolutional neural network...')
    else:
      model = LinearNetwork()
      print('Training fully-connected neural network...')

    print(f'Parameters in model: {param_count(model)}')
    model.to(device)

    # initialize an optimizer to update our model's parameters during training
    if use_conv:
      optimizer = torch.optim.Adadelta(model.parameters(), lr=1.0)
    else:
      optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

    # make a new directory in which to download the MNIST dataset
    data_dir = "./data/"
    
    # initialize a Transform object to prepare our data
    transform = torchvision.transforms.Compose([
        torchvision.transforms.ToTensor(),
        lambda x: x>0,
        lambda x: x.float(),
    ])

    # load MNIST "test" dataset from disk
    mnist_test = datasets.MNIST(data_dir, train=False, download=True, transform=transform)

    # load MNIST "train" dataset from disk and set aside a portion for validation
    mnist_train_full = datasets.MNIST(data_dir, train=True, download=True, transform=transform)
    mnist_train, mnist_val = torch.utils.data.random_split(mnist_train_full, [55000, 5000])

    # initialize a DataLoader object for each dataset
    train_dataloader = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)
    val_dataloader = torch.utils.data.DataLoader(mnist_val, batch_size=batch_size, shuffle=False)
    test_dataloader = torch.utils.data.DataLoader(mnist_test, batch_size=1, shuffle=False)

    # a PyTorch categorical cross-entropy loss object
    loss_fn = torch.nn.CrossEntropyLoss()

    # time training process
    st = time.time()

    # keep track of best validation accuracy; if improved upon, save checkpoint
    best_acc = 0.0

    # time to start training!
    for epoch_idx, epoch in enumerate(range(epochs)):

        # loop through the entire dataset once per epoch
        train_loss = 0.0
        train_acc = 0.0
        train_total = 0
        model.train()
        for batch_idx, batch in enumerate(train_dataloader):

            # clear gradients
            optimizer.zero_grad()

            # unpack data and labels
            x, y = batch
            x = x.to(device)  # we'll cover this in the next section!
            y = y.to(device)  # we'll cover this in the next section!

            # generate predictions and compute loss
            output = model(x)  # (batch_size, 10)
            loss = loss_fn(output, y)

            # compute accuracy
            preds = output.argmax(dim=1)
            acc = preds.eq(y).sum().item()/len(y)

            # compute gradients and update model parameters
            loss.backward()
            optimizer.step()

            # update statistics
            train_loss += (loss * len(x))
            train_acc += (acc * len(x))
            train_total += len(x)

        train_loss /= train_total
        train_acc /= train_total

        ########################################################################
        # NEW: log to TensorBoard
        ########################################################################

        if logger is not None:
          logger.add_scalar("train_loss", train_loss, epoch_idx)
          logger.add_scalar("train_acc", train_acc, epoch_idx)

        # perform validation once per epoch
        val_loss = 0.0
        val_acc = 0.0
        val_total = 0
        model.eval()
        for batch_idx, batch in enumerate(val_dataloader):

            # don't compute gradients during validation
            with torch.no_grad():

                # unpack data and labels
                x, y = batch
                x = x.to(device)  # we'll cover this in the next section!
                y = y.to(device)  # we'll cover this in the next section!

                # generate predictions and compute loss
                output = model(x)
                loss = loss_fn(output, y)

                # compute accuracy
                preds = output.argmax(dim=1)
                acc = preds.eq(y).sum().item()/len(y)

                # update statistics
                val_loss += (loss * len(x))
                val_acc += (acc * len(x))
                val_total += len(x)

        val_loss /= val_total
        val_acc /= val_total

        ########################################################################
        # NEW: log to TensorBoard
        ########################################################################
        
        if logger is not None:
          logger.add_scalar("val_loss", val_loss, epoch_idx)
          logger.add_scalar("val_acc", val_acc, epoch_idx)
        
        print(f"Epoch {epoch_idx + 1}: val loss {val_loss :0.3f}, val acc {val_acc :0.3f}, train loss {train_loss :0.3f}, train acc {train_acc :0.3f}")

        if val_acc > best_acc:
            print(f"New best accuracy {val_acc : 0.3f} (old {best_acc : 0.3f}); saving model weights to {save_path}")
            best_acc = val_acc
            torch.save(model.state_dict(), save_path)

    print(f"Total training time (s): {time.time() - st :0.3f}")
    
    return model, save_path, device
        

In [None]:
# run our training loop
model, save_path, device = training_loop(
    save_path="mnist_review.pt", 
    epochs=10, 
    batch_size=60, 
    device="cuda" if torch.cuda.is_available() else "cpu",
    use_conv=True,
    logger=logger
)

We can also run TensorBoard from the terminal, in which case we can view the logs in a browser by navigating to the correct port on our `localhost`. In the example below, after running the command we would need to point our browser to `localhost:9999`

```
$ tensorboard --logdir /path/to/logging/directory/ --port 9999
```

If no port is given, TensorBoard will default to 6006. In fact, the logs from your experiment above should already be visible at `localhost:6006`. TensorBoard will continue serving on this port until the notebook kernel shuts down or you halt the terminal command (e.g. using `ctrl` + `c`), at which point you will not be able to view your logs until you re-start TensorBoard.