# Notebook 8: RNNs

In this notebook, we'll look at __Recurrent Neural Networks (RNNs)__ for a name/language classification task. First, we'll implement an RNN as a simple _cell_ using a for-loop. Then, we'll see how PyTorch's built-in modules allow us to deal with RNNs as conventional _layers_ in a network.

This notebook borrows from several tutorials. If you want to dive into RNNs, check out:
* [PyTorch RNNs from "Scratch"](https://jaketae.github.io/study/pytorch-rnn/)  
* [PyTorch RNN Classification Tutorial](https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html)  
* [The Unreasonable Effectiveness of RNNs](https://karpathy.github.io/2015/05/21/rnn-effectiveness/)  

The notebook is broken up as follows:

  1. [Setup](#setup)  
  2. [Data Preparation](#data)  
  3. [A Simple RNN Cell](#cell)  
  4. [A Simple RNN Layer](#layer)  

## __1.__ <a name="setup">Setup</a>

In [1]:
# helper code from the course repository
!git clone https://github.com/interactiveaudiolab/course-deep-learning.git
# install common pacakges used for deep learning
!cd course-deep-learning/ && pip install -r requirements.txt

Cloning into 'course-deep-learning'...
remote: Enumerating objects: 391, done.[K
remote: Counting objects: 100% (12/12), done.[K
remote: Compressing objects: 100% (9/9), done.[K
remote: Total 391 (delta 4), reused 10 (delta 3), pack-reused 379[K
Receiving objects: 100% (391/391), 138.20 MiB | 22.79 MiB/s, done.
Resolving deltas: 100% (183/183), done.
Collecting ipython>=7.0
  Downloading ipython-7.33.0-py3-none-any.whl (793 kB)
[K     |████████████████████████████████| 793 kB 5.3 MB/s 
Collecting numpy<=1.21
  Downloading numpy-1.21.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
[K     |████████████████████████████████| 15.7 MB 2.1 MB/s 
Collecting prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0
  Downloading prompt_toolkit-3.0.29-py3-none-any.whl (381 kB)
[K     |████████████████████████████████| 381 kB 61.0 MB/s 
Installing collected packages: numpy, prompt-toolkit, ipython
  Attempting uninstall: numpy
    Found existing installation: numpy 1.21.6
    Unin

In [2]:
# download a dataset of names/languages 
%cd course-deep-learning/
!wget https://download.pytorch.org/tutorial/data.zip
!unzip data.zip

/content/course-deep-learning
--2022-05-22 22:40:27--  https://download.pytorch.org/tutorial/data.zip
Resolving download.pytorch.org (download.pytorch.org)... 18.64.174.23, 18.64.174.109, 18.64.174.119, ...
Connecting to download.pytorch.org (download.pytorch.org)|18.64.174.23|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2882130 (2.7M) [application/zip]
Saving to: ‘data.zip’


2022-05-22 22:40:27 (21.6 MB/s) - ‘data.zip’ saved [2882130/2882130]

Archive:  data.zip
   creating: data/
  inflating: data/eng-fra.txt        
   creating: data/names/
  inflating: data/names/Arabic.txt   
  inflating: data/names/Chinese.txt  
  inflating: data/names/Czech.txt    
  inflating: data/names/Dutch.txt    
  inflating: data/names/English.txt  
  inflating: data/names/French.txt   
  inflating: data/names/German.txt   
  inflating: data/names/Greek.txt    
  inflating: data/names/Irish.txt    
  inflating: data/names/Italian.txt  
  inflating: data/names/Japanese.txt  
 

In [3]:
%matplotlib inline

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
from torchsummary import summary
from tqdm import tqdm
import random

from __future__ import unicode_literals, print_function, division
from io import open
import glob
import os
import unicodedata
import string

## __2.__ <a name="data">Data Preparation</a>

In this notebook, we'll train an RNN to predict the language from which a given name originates. To do so, we'll need to create a dataset mapping names to languages. We'll start by creating a dictionary in which the keys are names of languages, and the values are lists of names from the corresponding language.

In [4]:
def create_names_dict(data_dir: str):
  """
  Given a directory containing name/language data, create a dictionary mapping
  each language to a list of string names
  """

  # data is stored in text files, by language
  name_files = glob.glob(f'{data_dir}/names/*.txt')
  
  # convert unicode strings to ASCII strings; see https://stackoverflow.com/a/518232/2809427
  letters = string.ascii_letters + " .,;'"
  n_letters = len(letters)

  def unicodeToAscii(s):
      return ''.join(
          c for c in unicodedata.normalize('NFD', s)
          if unicodedata.category(c) != 'Mn'
          and c in letters
      )
  
  # prepare to store data
  dataset = {}

  for name_file in name_files:

    # keep track of all possible languages (labels)
    language = os.path.splitext(os.path.basename(name_file))[0]
    
    # read in names for given language
    lines = open(name_file, encoding='utf-8').read().strip().split('\n')
    lines = [unicodeToAscii(line) for line in lines]
    dataset[language] = lines

  return dataset

In [5]:
names_dict = create_names_dict("data")
languages = list(names_dict.keys())

print(f"Languages: {languages}")
print(f"Example names for {languages[0]}: {names_dict[languages[0]][:10]}")

Languages: ['Japanese', 'Irish', 'Czech', 'Spanish', 'Italian', 'Arabic', 'Portuguese', 'Vietnamese', 'German', 'Dutch', 'Chinese', 'Korean', 'Scottish', 'Greek', 'English', 'Russian', 'Polish', 'French']
Example names for Japanese: ['Abe', 'Abukara', 'Adachi', 'Aida', 'Aihara', 'Aizawa', 'Ajibana', 'Akaike', 'Akamatsu', 'Akatsuka']


This is a great start, but there's one issue -- our data are strings! Just like in Notebook 3, we'll have to reformat our data as `torch.Tensor` objects in order to pass them to a neural network. How can we encode string data in a tensor format?

We could represent characters as integers, and store each name as an integer-valued vector. The problem is, if we feed such a numeric representation to our network directly -- e.g., to a linear layer -- it will attempt to infer ordinal relationships between these integer encodings that don't exist!


```
> name = torch.as_tensor([2, 0, 17, 11])  # "CARL"
...
>> prediction = my_network(name)
...
>>> hidden_state = my_linear_layer(name)
...
>>>> output = weight_matrix @ name
...
>>>>> result = 10.04 * 2 -3.70 * 0 + 2.32 * 7 + 0.25 * 11
```

One solution is to use __one-hot encodings__. 

<br/>
<center>
<img width="600px" src="https://drive.google.com/uc?export=view&id=109z3YkGtMnlxSHK6gxHhwub5bT3yGkuG"/>
</center>
<br/>


This allows us to represent inputs numerically (so that we can pass them to a neural network), but without implying any ordinal relationship between, say, `"C"` and `"A"`.

In [6]:
def string_to_tensor(s: string):

  # to be safe, cast to lower case
  s = s.lower()

  # store alphabet of possible characters
  letters = string.ascii_letters + " .,;'"

  # initialize tensor to hold one-hot encoding
  t = torch.zeros(len(s), 1, len(letters))

  # loop through string and encode each character
  for i, c in enumerate(s):
    t[i][0][letters.find(c)] = 1

  return t

In [7]:
s = "CARL"
t = string_to_tensor(s)
print(f"Resulting one-hot-encoded tensor for string `{s}` has shape {t.shape}")
t

Resulting one-hot-encoded tensor for string `CARL` has shape torch.Size([4, 1, 57])


tensor([[[0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0.]],

        [[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
          0., 0., 0., 0., 0., 0

Now that we've encoded our inputs numerically, we can also encode our targets (languages). This is a simple classification task -- given one input name, we want to predict one target language -- and so we can train with [cross-entropy loss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html), just as we have before. Recall that this loss function expects a __prediction__ in the form of a vector of class scores, and a __ground-truth__ target in the form of an integer index. So we just need to map languages to integers.

In [8]:
def create_final_dataset(data_dir: str):
  """
  Create a names/languages dataset in which both inputs (names) and 
  targets (languages) are encoded in tensor format.
  """

  # create string dataset
  names_dict = create_names_dict(data_dir)
  languages = list(names_dict.keys())

  # convert names to tensor form (one-hot)
  names_dict = {k: [string_to_tensor(s) for s in v] for k, v in names_dict.items()}

  # convert languages to tensor form (integer indices)
  dataset = {
      torch.tensor([languages.index(l)], dtype=torch.long) : names_dict[l] for l in languages
  }

  return dataset

At this stage, our network's inputs will be tensors with shape `(L, B, V)`, where
  * `L` is the sequence length, i.e. the number of letters in the name
  *  `B` is the batch size (for now, 1, as we will be predicting over one name at a time)
  *  `V` is the vocabulary size, i.e. the number of possible letters (and thus the size of one-hot encoded vectors)

This is a bit different from our image-domain datasets, where we dealt with inputs of shape `(B, C, H, W)` or `(B, C, H*W)`. As we'll see below, it is often convenient to put the __sequence dimension__ first when training RNNs due to the looping that occurs over this dimension. However, we'll also see how it's possible to use the more familiar __batch-first__ data format.

## __3.__ <a name="cell">A Simple RNN Cell</a>

It's now time to define our network. Note that our dataset consists of __variable-length__ tensors representing names (inputs) and single-element tensors representing languages (targets). Whereas a feedforward network requires fixed-length inputs, we can define an RNN model that loops over the sequence (length) dimension of inputs before finally producing a prediction. At each time step, we'll have to update the RNN's __hidden state__ using both the  input for that time step and the RNN's hidden state from the previous time step. At the final time step, we'll produce an __output state__ (prediction) in the form of a vector of class scores. This task is represented in the third image from the left below ("many to one"). We'll explore some of these other recurrent tasks later.

<br/>
<center>
<img width="600px" src="https://karpathy.github.io/assets/rnn/diags.jpeg"/>
</center>
<br/>

<p>
<center>
Image source: "The Unreasonable Effectiveness of Recurrent Neural Networks" (Karpathy)
</center>
</p>


This "raw" definition of an RNN, in which we handle the looping and state-update process manually, is often referred to as a __cell__. Later, we'll see how we can also use PyTorch to abstract away these details and treat RNNs as more conventional __layers__ in a network.

In [9]:
class RNNCell(nn.Module):

  def __init__(self, 
                input_size: int, 
                hidden_size: int, 
                output_size: int):
      
    super().__init__()

    self.hidden_size = hidden_size

    # weight matrix 1: (inputs, previous hidden state) --> hidden state
    self.i2h = nn.Linear(input_size + hidden_size, hidden_size)

    # weight matrix 2: hidden state --> output state
    self.i2o = nn.Linear(input_size + hidden_size, output_size)

  def forward(self, input, hidden):
    """
    At each time step, concatenate the inputs for the current time step
    with the hidden state from the past time step and compute the current
    hidden state. Then, use the current hidden state to compute the current
    output state (which we'll only use at out final time step).
    """
    
    # ensure that input represents only a single time-step
    if input.ndim > 2:
      L, B, V, *_ = input.shape
      assert L == 1
      input = input.squeeze(0)  # reduce to (B, V)

    # concatenate inputs and previous hidden state
    combined = torch.cat((input, hidden), -1)  # assume "feature" dimension is last
    hidden = self.i2h(combined)
    output = self.i2o(combined)
    return output, hidden

Before we train in earnest, let's see how our RNN cell processes a single input sequence.

In [10]:
hidden_size = 64
n_languages = 18
n_letters = 57

rnncell = RNNCell(
    input_size=n_letters,    # inputs are one-hot-encoded (one entry per possible letter)
    hidden_size=hidden_size, # pick a large enough hidden size
    output_size=n_languages  # output a vector of class scores, one per language
)

# an example input
x = string_to_tensor("CARL")
print(f'Input shape: {x.shape}\n')

# feed input to RNN cell: loop over sequence dimension
for i, x_i in enumerate(x):

  print(f'Input shape for time step {i + 1}: {x_i.shape}')

  # at first time step, RNN cell has no previous hidden state!
  # we need to initialize one to feed in
  if not i:
    hidden_state = torch.zeros(1, rnncell.hidden_size)
  
  # pass in inputs for current time step and previous hidden state
  output_state, hidden_state = rnncell(x_i, hidden_state)

  print(f'Output state and hidden state shapes after update: {output_state.shape}, {hidden_state.shape}\n')

# our final output state will be our prediction!
print(f'Final output state (prediction) shape: {output_state.shape}')


Input shape: torch.Size([4, 1, 57])

Input shape for time step 1: torch.Size([1, 57])
Output state and hidden state shapes after update: torch.Size([1, 18]), torch.Size([1, 64])

Input shape for time step 2: torch.Size([1, 57])
Output state and hidden state shapes after update: torch.Size([1, 18]), torch.Size([1, 64])

Input shape for time step 3: torch.Size([1, 57])
Output state and hidden state shapes after update: torch.Size([1, 18]), torch.Size([1, 64])

Input shape for time step 4: torch.Size([1, 57])
Output state and hidden state shapes after update: torch.Size([1, 18]), torch.Size([1, 64])

Final output state (prediction) shape: torch.Size([1, 18])


Now we have everything we need to train our RNN. Let's see if this thing can learn to map names to languages! We'll let our enthusiasm get the best of us and ignore validation for now.

In [11]:
def rnn_predict(x: torch.Tensor, rnncell: nn.Module):
  """Loop along sequence dimension of given input to get prediction"""

  assert x.ndim == 3  # (L, B, V)

  # feed input to RNN cell: loop over sequence dimension
  for i, x_i in enumerate(x):

    # at first time step, RNN cell has no previous hidden state!
    # we need to initialize one to feed in
    if not i:
      hidden_state = torch.zeros(1, rnncell.hidden_size)
  
    # pass in inputs for current time step and previous hidden state
    output_state, hidden_state = rnncell(x_i, hidden_state)

  # return final output state
  return output_state


In [12]:
n_laguages = 18
n_letters = 57
hidden_size = 256

rnncell = RNNCell(
    input_size=n_letters,    # inputs are one-hot-encoded (one entry per possible letter)
    hidden_size=hidden_size, # pick a large enough hidden size
    output_size=n_languages  # output a vector of class scores, one per language
)
rnncell.train()  # training mode

dataset = create_final_dataset("data")
languages = list(dataset.keys())

criterion = nn.CrossEntropyLoss()
lr = 0.005

optimizer = torch.optim.SGD(rnncell.parameters(), lr=lr)

max_iter = 100_000

pbar = tqdm(range(max_iter), total=max_iter)
for i in pbar:

  # select a random training example: random language and random name
  language = random.choice(languages)
  name = random.choice(dataset[language])

  optimizer.zero_grad(set_to_none=True)

  # compute prediction (scores for each language)
  prediction = rnn_predict(name, rnncell)

  # compute loss
  loss = criterion(prediction, language)
  loss.backward()

  optimizer.step()

  # we'll keep our logging minimal for now -- just print loss periodically
  if not i % 1000:
    pbar.set_description(f'loss at iter {i}: {loss.item() :0.4f}')

loss at iter 99000: 1.4660: 100%|██████████| 100000/100000 [03:02<00:00, 546.91it/s]


Let's check out our trained networks predictions.

In [13]:
rnncell.eval()

language_names = list(create_names_dict("data").keys())

name = "Telemakos"

def get_language_name(pred: torch.Tensor):
  return language_names[pred.argmax().item()]

pred = rnn_predict(string_to_tensor(name), rnncell)
print(f'Predicted language for name {name}: {get_language_name(pred)}')

Predicted language for name Telemakos: Greek


The purpose of this exercise isn't really to train an accurate language predictor, but rather to get familiar with the mechanics of operating an RNN. PyTorch has built-in implementations of a variety of RNN cell types, such as LSTM. 

However, some of these cells behave differently. For example, LSTM expects both a hidden state _and_ a cell state at each time step. Moreover, the LSTM implementation does not include a linear layer to project the internal cell state to an output state of the required size (in our case, the number of languages). We'll have to tweak our code and handle this projection externally in our prediction function.

In [14]:
class LSTMCell(nn.Module):

  def __init__(self, input_size: int, hidden_size: int, output_size: int):

    super().__init__()

    self.hidden_size = hidden_size

    self.cell = nn.LSTMCell(input_size=input_size, hidden_size=hidden_size)
    self.out_proj = nn.Linear(hidden_size, output_size)

  def forward(self, x: torch.Tensor, state: tuple):

    # unpack prior state: hidden and cell states
    hx, cx = state

    return self.cell(x, (hx, cx))  # (h_x, c_x) for next time step
  
  def project(self, cell_state: torch.Tensor):

    return self.out_proj(cell_state)

In [15]:
def lstm_predict(x: torch.Tensor, lstmcell: nn.Module):
  """Loop along sequence dimension of given input to get prediction"""

  assert x.ndim == 3  # (L, B, V)

  # feed input to RNN cell: loop over sequence dimension
  for i, x_i in enumerate(x):

    # at first time step, RNN cell has no previous hidden state!
    # we need to initialize one to feed in
    if not i:
      hidden_state = torch.zeros(1, lstmcell.hidden_size)
      cell_state = torch.zeros(1, lstmcell.hidden_size)
  
    # pass in inputs for current time step and previous hidden state
    hidden_state, cell_state = lstmcell(x_i, (hidden_state, cell_state))

  # project final output state
  output_state = lstmcell.project(cell_state)

  # return final output state
  return output_state


In [None]:
n_laguages = 18
n_letters = 57
hidden_size = 256

lstmcell = LSTMCell(
    input_size=n_letters,    # inputs are one-hot-encoded (one entry per possible letter)
    hidden_size=hidden_size, # pick a large enough hidden size
    output_size=n_languages  # output a vector of class scores, one per language
)
lstmcell.train()  # training mode

dataset = create_final_dataset("data")
languages = list(dataset.keys())

criterion = nn.CrossEntropyLoss()
lr = 0.005

optimizer = torch.optim.SGD(lstmcell.parameters(), lr=lr)

max_iter = 100_000

pbar = tqdm(range(max_iter), total=max_iter)
for i in pbar:

  # select a random training example: random language and random name
  language = random.choice(languages)
  name = random.choice(dataset[language])

  optimizer.zero_grad(set_to_none=True)

  # compute prediction (scores for each language)
  prediction = lstm_predict(name, lstmcell)

  # compute loss
  loss = criterion(prediction, language)
  loss.backward()

  optimizer.step()

  # we'll keep our logging minimal for now -- just print loss periodically
  if not i % 1000:
    pbar.set_description(f'loss at iter {i}: {loss.item() :0.4f}')

Again, we won't perform an in-depth comparison of different RNN cell types here. The important takeaway is that __different RNN cell types maintain different kinds of internal states, and you may have to modify your code correspondingly.__

Before we move on, let's take a look at our LSTM's predictions:

In [17]:
lstmcell.eval()

language_names = list(create_names_dict("data").keys())

name = "Telemakos"

def get_language_name(pred: torch.Tensor):
  return language_names[pred.argmax().item()]

pred = lstm_predict(string_to_tensor(name), lstmcell)
print(f'Predicted language for name {name}: {get_language_name(pred)}')

Predicted language for name Telemakos: Greek


## __4.__ <a name="layer">A Simple RNN Layer</a>

In the code above, we dealt with RNNs in a "cell" format. This required us to manually loop over the sequence dimension of our data, passing both inputs and prior states at each time step. While this level of control can be useful, we will often want to deal with RNNs in the same simplified way as other network architectures we have seen this term. That is, we want to deal with RNNs as network __layers__ to which we can hand batched data, just like multi-layer perceptrons or convolutional networks.

Our first step will be to modify our data-loading code. We'll wrap our names and languages in a more conventional `Dataset` object.

In [57]:
class RNNDataset(torch.utils.data.Dataset):

  def __init__(self, data_dict: dict):
      super().__init__()

      # store inputs (variable-length tensors) and targets 
      # (single-element tensors) in list format
      self.x = []
      self.y = []

      for language, names in data_dict.items():

        self.x.extend(names)
        self.y.extend([language] * len(names))
    
  def __len__(self):
    """A required method of Dataset subclasses."""
    return len(self.y)

  def __getitem__(self, idx):
    """A required method of Dataset subclasses."""

    x = self.x[idx]
    y = self.y[idx]

    # create batch tensors by padding to maximum sequence length
    if isinstance(x, list):
      
      # we still want to keep track of each individual name's length
      lengths = [len(x_i) for x_i in x]

      # handle padding and tensor concatenation
      x = torch.nn.utils.rnn.pad_sequence(x)
      
      # reshaping inputs to batch-first format can be more legible
      x = x.squeeze(2).permute(1, 0, 2)

    else:

      # we still want to keep track of each individual name's length
      lengths = [len(x)]

      # reshaping inputs to batch-first format can be more legible
      x = x.permute(1, 0, 2)
    
    # we'll also put our targets in tensor format
    if isinstance(y, list):
      y = torch.cat(y, dim=0)    

    # return resulting tensors and actual lengths
    return x, y, lengths 


def collate_fn(batch):
  """
  Given a batch of tuples fetched from the dataset, make sure
  input tensors are padded to maximum length. This is essentially 
  what `torch.nn.utils.rnn.pad_sequence()` does -- but for the sake
  of learning, let's try it ourselves.
  """
  x, y, lengths = zip(*batch)

  _, _, v = next(iter(x)).shape
  b = len(x)
  max_l = max([x_i.shape[1] for x_i in x])

  x_padded = torch.zeros(b, max_l, v)

  for i, x_i in enumerate(x):
    x_padded[i, :x_i.shape[1], :] = x_i

  y = torch.cat(y)

  lengths = [next(iter(l)) for l in lengths]

  return x_padded, y, lengths


data_dict = create_final_dataset("data")
dataset = RNNDataset(data_dict)

# example batch
x, y, lengths = dataset[:10]

# inputs have shape (B, MAX_L, V) where MAX_L is largest length in batch
print(f'Example batch of 10 instances: x (shape {x.shape}), y (shape {y.shape}), lengths {lengths}')

Example batch of 10 instances: x (shape torch.Size([10, 8, 57])), y (shape torch.Size([10])), lengths [3, 7, 6, 4, 6, 6, 7, 6, 8, 8]


Now, we'll define our recurrent network using the layer paradigm.

In [30]:
class RNNLayer(nn.Module):

  def __init__(self, 
               input_size: int, 
               hidden_size: int, 
               output_size: int,
               num_layers: int = 1
               ):

    super().__init__()

    self.rnn = nn.RNN(
        input_size=input_size,
        hidden_size=hidden_size,
        num_layers=num_layers,
        batch_first=True
    )

    self.out_proj = nn.Linear(hidden_size, output_size)

  def forward(self, x: torch.Tensor, lengths: list):

    # require batched inputs: (B, MAX_L, V)
    assert x.ndim == 3
    b, l, v = x.shape

    # built-in PyTorch layer handles loop along sequence dimension,
    # including passing hidden state back each time step. It also 
    # handles creating a new initial state for each batch!
    output, hidden = self.rnn(x)

    # for each item in batch, take final output state (according to lengths)
    output = torch.stack([output[i][lengths[i] - 1] for i in range(b)])

    # apply final linear layer to get predictions
    output = self.out_proj(output)

    return output

We'll finish up by training for the same lanuage prediction task, this time using these more familiar network and dataset formats.

In [38]:
# initialize dataset
n_letters = 57
n_languages = 18
data_dict = create_final_dataset("data")
dataset = RNNDataset(data_dict)

# initialize data loader for random batching
loader = torch.utils.data.DataLoader(
    dataset,
    batch_size=20,
    shuffle=True,
    collate_fn=collate_fn
)

# initialize network, optimizer, and loss function
model = RNNLayer(
    input_size=n_letters,
    hidden_size=256,
    output_size=n_languages,
    num_layers=2  # let's try 2 stacked RNN layers!
)
model.train()

optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
criterion = nn.CrossEntropyLoss()

# train!
epochs = 10

for epoch in range(epochs):

  correct = 0
  n = 0

  pbar = tqdm(enumerate(loader), total=len(loader))
  for j, batch in pbar:

    optimizer.zero_grad()

    x, y, lengths = batch

    predictions = model(x, lengths)

    loss = criterion(predictions, y)

    loss.backward()
    optimizer.step()

    pbar.set_description(f'Loss: {loss.item() :0.4f}')

    # track training accuracy per epoch
    correct += ((predictions.argmax(dim=-1) == y) > 0).sum().item()
    n += len(y)
  
  print(f'Epoch {epoch + 1} training accuracy: {correct/n :0.4f}')

Loss: 1.219748854637146: 100%|██████████| 1004/1004 [00:18<00:00, 54.08it/s]


Epoch 1 training accuracy: 0.5564411676795855


Loss: 0.5989579558372498: 100%|██████████| 1004/1004 [00:17<00:00, 56.34it/s]


Epoch 2 training accuracy: 0.6723124439573578


Loss: 0.431468665599823: 100%|██████████| 1004/1004 [00:17<00:00, 56.28it/s]


Epoch 3 training accuracy: 0.710072730895686


Loss: 0.7258985638618469: 100%|██████████| 1004/1004 [00:17<00:00, 56.15it/s]


Epoch 4 training accuracy: 0.7239214904852047


Loss: 0.732782781124115: 100%|██████████| 1004/1004 [00:19<00:00, 52.25it/s]


Epoch 5 training accuracy: 0.7357278071136794


Loss: 0.34172964096069336: 100%|██████████| 1004/1004 [00:19<00:00, 52.21it/s]


Epoch 6 training accuracy: 0.7477333864700608


Loss: 0.7524484992027283: 100%|██████████| 1004/1004 [00:18<00:00, 54.09it/s]


Epoch 7 training accuracy: 0.7562020524060974


Loss: 0.7793447375297546: 100%|██████████| 1004/1004 [00:18<00:00, 54.54it/s]


Epoch 8 training accuracy: 0.7648699810700409


Loss: 1.0252188444137573: 100%|██████████| 1004/1004 [00:17<00:00, 56.27it/s]


Epoch 9 training accuracy: 0.7708976785892199


Loss: 0.861352801322937: 100%|██████████| 1004/1004 [00:17<00:00, 56.09it/s]

Epoch 10 training accuracy: 0.7781209524758393





Finally, let's try out our trained RNN model. We'll tweak our code slightly to accomodate our new assumptions on the data format.

In [56]:
model.eval()

language_names = list(create_names_dict("data").keys())

name = "Marcello"

def get_language_name(pred: torch.Tensor):
  return language_names[pred.argmax(dim=-1).item()]

pred = model(string_to_tensor(name).permute(1, 0, 2), [len(name)])

print(f'Predicted language for name {name}: {get_language_name(pred)}')

Predicted language for name Marcello: Italian
