|
| 1 | +#%% [markdown] |
| 2 | +# # Neural networks with PyTorch |
| 3 | +# |
| 4 | +# Next I'll show you how to build a neural network with PyTorch. |
| 5 | + |
| 6 | +#%% |
| 7 | +# Import things like usual |
| 8 | + |
| 9 | +get_ipython().run_line_magic('matplotlib', 'inline') |
| 10 | +get_ipython().run_line_magic('config', "InlineBackend.figure_format = 'retina'") |
| 11 | + |
| 12 | +import numpy as np |
| 13 | +import torch |
| 14 | + |
| 15 | +import helper |
| 16 | + |
| 17 | +import matplotlib.pyplot as plt |
| 18 | +from torchvision import datasets, transforms |
| 19 | + |
| 20 | +#%% [markdown] |
| 21 | +# First up, we need to get our dataset. This is provided through the `torchvision` package. The code below will download the MNIST dataset, then create training and test datasets for us. Don't worry too much about the details here, you'll learn more about this later. |
| 22 | + |
| 23 | +#%% |
| 24 | +# Define a transform to normalize the data |
| 25 | +transform = transforms.Compose([transforms.ToTensor(), |
| 26 | + transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), |
| 27 | + ]) |
| 28 | +# Download and load the training data |
| 29 | +trainset = datasets.MNIST('MNIST_data/', download=True, train=True, transform=transform) |
| 30 | +trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True) |
| 31 | + |
| 32 | +# Download and load the test data |
| 33 | +testset = datasets.MNIST('MNIST_data/', download=True, train=False, transform=transform) |
| 34 | +testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True) |
| 35 | + |
| 36 | + |
| 37 | +#%% |
| 38 | +dataiter = iter(trainloader) |
| 39 | +images, labels = dataiter.next() |
| 40 | + |
| 41 | +#%% [markdown] |
| 42 | +# We have the training data loaded into `trainloader` and we make that an iterator with `iter(trainloader)`. We'd use this to loop through the dataset for training, but here I'm just grabbing the first batch so we can check out the data. We can see below that `images` is just a tensor with size (64, 1, 28, 28). So, 64 images per batch, 1 color channel, and 28x28 images. |
| 43 | + |
| 44 | +#%% |
| 45 | +plt.imshow(images[1].numpy().squeeze(), cmap='Greys_r'); |
| 46 | + |
| 47 | +#%% [markdown] |
| 48 | +# ## Building networks with PyTorch |
| 49 | +# |
| 50 | +# Here I'll use PyTorch to build a simple feedfoward network to classify the MNIST images. That is, the network will receive a digit image as input and predict the digit in the image. |
| 51 | +# |
| 52 | +# <img src="assets/mlp_mnist.png" width=600px> |
| 53 | +# |
| 54 | +# To build a neural network with PyTorch, you use the `torch.nn` module. The network itself is a class inheriting from `torch.nn.Module`. You define each of the operations separately, like `nn.Linear(784, 128)` for a fully connected linear layer with 784 inputs and 128 units. |
| 55 | +# |
| 56 | +# The class needs to include a `forward` method that implements the forward pass through the network. In this method, you pass some input tensor `x` through each of the operations you defined earlier. The `torch.nn` module also has functional equivalents for things like ReLUs in `torch.nn.functional`. This module is usually imported as `F`. Then to use a ReLU activation on some layer (which is just a tensor), you'd do `F.relu(x)`. Below are a few different commonly used activation functions. |
| 57 | +# |
| 58 | +# <img src="assets/activation.png" width=700px> |
| 59 | +# |
| 60 | +# So, for this network, I'll build it with three fully connected layers, then a softmax output for predicting classes. The softmax function is similar to the sigmoid in that it squashes inputs between 0 and 1, but it's also normalized so that all the values sum to one like a proper probability distribution. |
| 61 | + |
| 62 | +#%% |
| 63 | +from torch import nn |
| 64 | +from torch import optim |
| 65 | +import torch.nn.functional as F |
| 66 | + |
| 67 | + |
| 68 | +#%% |
| 69 | +class Network(nn.Module): |
| 70 | + def __init__(self): |
| 71 | + super().__init__() |
| 72 | + # Defining the layers, 128, 64, 10 units each |
| 73 | + self.fc1 = nn.Linear(784, 128) |
| 74 | + self.fc2 = nn.Linear(128, 64) |
| 75 | + # Output layer, 10 units - one for each digit |
| 76 | + self.fc3 = nn.Linear(64, 10) |
| 77 | + |
| 78 | + def forward(self, x): |
| 79 | + ''' Forward pass through the network, returns the output logits ''' |
| 80 | + |
| 81 | + x = self.fc1(x) |
| 82 | + x = F.relu(x) |
| 83 | + x = self.fc2(x) |
| 84 | + x = F.relu(x) |
| 85 | + x = self.fc3(x) |
| 86 | + x = F.softmax(x, dim=1) |
| 87 | + |
| 88 | + return x |
| 89 | + |
| 90 | +model = Network() |
| 91 | +model |
| 92 | + |
| 93 | +#%% [markdown] |
| 94 | +# ### Initializing weights and biases |
| 95 | +# |
| 96 | +# The weights and such are automatically initialized for you, but it's possible to customize how they are initialized. The weights and biases are tensors attached to the layer you defined, you can get them with `model.fc1.weight` for instance. |
| 97 | + |
| 98 | +#%% |
| 99 | +print(model.fc1.weight) |
| 100 | +print(model.fc1.bias) |
| 101 | + |
| 102 | +#%% [markdown] |
| 103 | +# For custom initialization, we want to modify these tensors in place. These are actually autograd *Variables*, so we need to get back the actual tensors with `model.fc1.weight.data`. Once we have the tensors, we can fill them with zeros (for biases) or random normal values. |
| 104 | + |
| 105 | +#%% |
| 106 | +# Set biases to all zeros |
| 107 | +model.fc1.bias.data.fill_(0) |
| 108 | + |
| 109 | + |
| 110 | +#%% |
| 111 | +# sample from random normal with standard dev = 0.01 |
| 112 | +model.fc1.weight.data.normal_(std=0.01) |
| 113 | + |
| 114 | +#%% [markdown] |
| 115 | +# ### Forward pass |
| 116 | +# |
| 117 | +# Now that we have a network, let's see what happens when we pass in an image. This is called the forward pass. We're going to convert the image data into a tensor, then pass it through the operations defined by the network architecture. |
| 118 | + |
| 119 | +#%% |
| 120 | +# Grab some data |
| 121 | +dataiter = iter(trainloader) |
| 122 | +images, labels = dataiter.next() |
| 123 | + |
| 124 | +# Resize images into a 1D vector, new shape is (batch size, color channels, image pixels) |
| 125 | +images.resize_(64, 1, 784) |
| 126 | +# or images.resize_(images.shape[0], 1, 784) to not automatically get batch size |
| 127 | + |
| 128 | +# Forward pass through the network |
| 129 | +img_idx = 0 |
| 130 | +ps = model.forward(images[img_idx,:]) |
| 131 | + |
| 132 | +img = images[img_idx] |
| 133 | +helper.view_classify(img.view(1, 28, 28), ps) |
| 134 | + |
| 135 | +#%% [markdown] |
| 136 | +# As you can see above, our network has basically no idea what this digit is. It's because we haven't trained it yet, all the weights are random! |
| 137 | +# |
| 138 | +# PyTorch provides a convenient way to build networks like this where a tensor is passed sequentially through operations, `nn.Sequential` ([documentation](https://pytorch.org/docs/master/nn.html#torch.nn.Sequential)). Using this to build the equivalent network: |
| 139 | + |
| 140 | +#%% |
| 141 | +# Hyperparameters for our network |
| 142 | +input_size = 784 |
| 143 | +hidden_sizes = [128, 64] |
| 144 | +output_size = 10 |
| 145 | + |
| 146 | +# Build a feed-forward network |
| 147 | +model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]), |
| 148 | + nn.ReLU(), |
| 149 | + nn.Linear(hidden_sizes[0], hidden_sizes[1]), |
| 150 | + nn.ReLU(), |
| 151 | + nn.Linear(hidden_sizes[1], output_size), |
| 152 | + nn.Softmax(dim=1)) |
| 153 | +print(model) |
| 154 | + |
| 155 | +# Forward pass through the network and display output |
| 156 | +images, labels = next(iter(trainloader)) |
| 157 | +images.resize_(images.shape[0], 1, 784) |
| 158 | +ps = model.forward(images[0,:]) |
| 159 | +helper.view_classify(images[0].view(1, 28, 28), ps) |
| 160 | + |
| 161 | +#%% [markdown] |
| 162 | +# You can also pass in an `OrderedDict` to name the individual layers and operations. Note that a dictionary keys must be unique, so _each operation must have a different name_. |
| 163 | + |
| 164 | +#%% |
| 165 | +from collections import OrderedDict |
| 166 | +model = nn.Sequential(OrderedDict([ |
| 167 | + ('fc1', nn.Linear(input_size, hidden_sizes[0])), |
| 168 | + ('relu1', nn.ReLU()), |
| 169 | + ('fc2', nn.Linear(hidden_sizes[0], hidden_sizes[1])), |
| 170 | + ('relu2', nn.ReLU()), |
| 171 | + ('output', nn.Linear(hidden_sizes[1], output_size)), |
| 172 | + ('softmax', nn.Softmax(dim=1))])) |
| 173 | +model |
| 174 | + |
| 175 | +#%% [markdown] |
| 176 | +# Now it's your turn to build a simple network, use any method I've covered so far. In the next notebook, you'll learn how to train a network so it can make good predictions. |
| 177 | +# |
| 178 | +# >**Exercise:** Build a network to classify the MNIST images with _three_ hidden layers. Use 400 units in the first hidden layer, 200 units in the second layer, and 100 units in the third layer. Each hidden layer should have a ReLU activation function, and use softmax on the output layer. |
| 179 | + |
| 180 | +#%% |
| 181 | +## TODO: Your network here |
| 182 | + |
| 183 | + |
| 184 | +#%% |
| 185 | +## Run this cell with your model to make sure it works ## |
| 186 | +# Forward pass through the network and display output |
| 187 | +images, labels = next(iter(trainloader)) |
| 188 | +images.resize_(images.shape[0], 1, 784) |
| 189 | +ps = model.forward(images[0,:]) |
| 190 | +helper.view_classify(images[0].view(1, 28, 28), ps) |
| 191 | + |
| 192 | + |
0 commit comments