Skip to content

Commit 219e68c

Browse files
committed
Add all .py files
1 parent b6e0dc5 commit 219e68c

10 files changed

+1536
-0
lines changed
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
#%% [markdown]
2+
# # Introduction to Deep Learning with PyTorch
3+
#
4+
# In this notebook, you'll get introduced to [PyTorch](http://pytorch.org/), a framework for building and training neural networks. PyTorch in a lot of ways behaves like the arrays you love from Numpy. These Numpy arrays, after all, are just tensors. PyTorch takes these tensors and makes it simple to move them to GPUs for the faster processing needed when training neural networks. It also provides a module that automatically calculates gradients (for backpropagation!) and another module specifically for building neural networks. All together, PyTorch ends up being more coherent with Python and the Numpy/Scipy stack compared to TensorFlow and other frameworks.
5+
#
6+
#
7+
#%% [markdown]
8+
# ## Neural Networks
9+
#
10+
# Deep Learning is based on artificial neural networks which have been around in some form since the late 1950s. The networks are built from individual parts approximating neurons, typically called units or simply "neurons." Each unit has some number of weighted inputs. These weighted inputs are summed together (a linear combination) then passed through an activation function to get the unit's output.
11+
#
12+
# <img src="assets/simple_neuron.png" width=400px>
13+
#
14+
# Mathematically this looks like:
15+
#
16+
# $$
17+
# \begin{align}
18+
# y &= f(w_1 x_1 + w_2 x_2 + b) \\
19+
# y &= f\left(\sum_i w_i x_i \right)
20+
# \end{align}
21+
# $$
22+
#
23+
# With vectors this is the dot/inner product of two vectors:
24+
#
25+
# $$
26+
# h = \begin{bmatrix}
27+
# x_1 \, x_2 \cdots x_n
28+
# \end{bmatrix}
29+
# \cdot
30+
# \begin{bmatrix}
31+
# w_1 \\
32+
# w_2 \\
33+
# \vdots \\
34+
# w_n
35+
# \end{bmatrix}
36+
# $$
37+
#%% [markdown]
38+
# ### Stack them up!
39+
#
40+
# We can assemble these unit neurons into layers and stacks, into a network of neurons. The output of one layer of neurons becomes the input for the next layer. With multiple input units and output units, we now need to express the weights as a matrix.
41+
#
42+
# <img src='assets/multilayer_diagram_weights.png' width=450px>
43+
#
44+
# We can express this mathematically with matrices again and use matrix multiplication to get linear combinations for each unit in one operation. For example, the hidden layer ($h_1$ and $h_2$ here) can be calculated
45+
#
46+
# $$
47+
# \vec{h} = [h_1 \, h_2] =
48+
# \begin{bmatrix}
49+
# x_1 \, x_2 \cdots \, x_n
50+
# \end{bmatrix}
51+
# \cdot
52+
# \begin{bmatrix}
53+
# w_{11} & w_{12} \\
54+
# w_{21} &w_{22} \\
55+
# \vdots &\vdots \\
56+
# w_{n1} &w_{n2}
57+
# \end{bmatrix}
58+
# $$
59+
#
60+
# The output for this small network is found by treating the hidden layer as inputs for the output unit. The network output is expressed simply
61+
#
62+
# $$
63+
# y = f_2 \! \left(\, f_1 \! \left(\vec{x} \, \mathbf{W_1}\right) \mathbf{W_2} \right)
64+
# $$
65+
#%% [markdown]
66+
# ## Tensors
67+
#
68+
# It turns out neural network computations are just a bunch of linear algebra operations on *tensors*, a generalization of matrices. A vector is a 1-dimensional tensor, a matrix is a 2-dimensional tensor, an array with three indices is a 3-dimensional tensor (RGB color images for example). The fundamental data structure for neural networks are tensors and PyTorch (as well as pretty much every other deep learning framework) is built around tensors.
69+
#
70+
# <img src="assets/tensor_examples.svg" width=600px>
71+
#
72+
# With the basics covered, it's time to explore how we can use PyTorch to build a simple neural network.
73+
74+
#%%
75+
get_ipython().run_line_magic('matplotlib', 'inline')
76+
get_ipython().run_line_magic('config', "InlineBackend.figure_format = 'retina'")
77+
78+
import numpy as np
79+
import torch
80+
81+
import helper
82+
83+
#%% [markdown]
84+
# First, let's see how we work with PyTorch tensors. These are the fundamental data structures of neural networks and PyTorch, so it's imporatant to understand how these work.
85+
86+
#%%
87+
x = torch.rand(3, 2)
88+
x
89+
90+
91+
#%%
92+
y = torch.ones(x.size())
93+
y
94+
95+
96+
#%%
97+
z = x + y
98+
z
99+
100+
#%% [markdown]
101+
# In general PyTorch tensors behave similar to Numpy arrays. They are zero indexed and support slicing.
102+
103+
#%%
104+
z[0]
105+
106+
107+
#%%
108+
z[:, 1:]
109+
110+
#%% [markdown]
111+
# Tensors typically have two forms of methods, one method that returns another tensor and another method that performs the operation in place. That is, the values in memory for that tensor are changed without creating a new tensor. In-place functions are always followed by an underscore, for example `z.add()` and `z.add_()`.
112+
113+
#%%
114+
# Return a new tensor z + 1
115+
z.add(1)
116+
117+
118+
#%%
119+
# z tensor is unchanged
120+
z
121+
122+
123+
#%%
124+
# Add 1 and update z tensor in-place
125+
z.add_(1)
126+
127+
128+
#%%
129+
# z has been updated
130+
z
131+
132+
#%% [markdown]
133+
# ### Reshaping
134+
#
135+
# Reshaping tensors is a really common operation. First to get the size and shape of a tensor use `.size()`. Then, to reshape a tensor, use `.resize_()`. Notice the underscore, reshaping is an in-place operation.
136+
137+
#%%
138+
z.size()
139+
140+
141+
#%%
142+
z.resize_(2, 3)
143+
144+
145+
#%%
146+
z
147+
148+
#%% [markdown]
149+
# ## Numpy to Torch and back
150+
#
151+
# Converting between Numpy arrays and Torch tensors is super simple and useful. To create a tensor from a Numpy array, use `torch.from_numpy()`. To convert a tensor to a Numpy array, use the `.numpy()` method.
152+
153+
#%%
154+
a = np.random.rand(4,3)
155+
a
156+
157+
158+
#%%
159+
b = torch.from_numpy(a)
160+
b
161+
162+
163+
#%%
164+
b.numpy()
165+
166+
#%% [markdown]
167+
# The memory is shared between the Numpy array and Torch tensor, so if you change the values in-place of one object, the other will change as well.
168+
169+
#%%
170+
# Multiply PyTorch Tensor by 2, in place
171+
b.mul_(2)
172+
173+
174+
#%%
175+
# Numpy array matches new values from Tensor
176+
a
177+
178+
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
#%% [markdown]
2+
# # Neural networks with PyTorch
3+
#
4+
# Next I'll show you how to build a neural network with PyTorch.
5+
6+
#%%
7+
# Import things like usual
8+
9+
get_ipython().run_line_magic('matplotlib', 'inline')
10+
get_ipython().run_line_magic('config', "InlineBackend.figure_format = 'retina'")
11+
12+
import numpy as np
13+
import torch
14+
15+
import helper
16+
17+
import matplotlib.pyplot as plt
18+
from torchvision import datasets, transforms
19+
20+
#%% [markdown]
21+
# First up, we need to get our dataset. This is provided through the `torchvision` package. The code below will download the MNIST dataset, then create training and test datasets for us. Don't worry too much about the details here, you'll learn more about this later.
22+
23+
#%%
24+
# Define a transform to normalize the data
25+
transform = transforms.Compose([transforms.ToTensor(),
26+
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
27+
])
28+
# Download and load the training data
29+
trainset = datasets.MNIST('MNIST_data/', download=True, train=True, transform=transform)
30+
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
31+
32+
# Download and load the test data
33+
testset = datasets.MNIST('MNIST_data/', download=True, train=False, transform=transform)
34+
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)
35+
36+
37+
#%%
38+
dataiter = iter(trainloader)
39+
images, labels = dataiter.next()
40+
41+
#%% [markdown]
42+
# We have the training data loaded into `trainloader` and we make that an iterator with `iter(trainloader)`. We'd use this to loop through the dataset for training, but here I'm just grabbing the first batch so we can check out the data. We can see below that `images` is just a tensor with size (64, 1, 28, 28). So, 64 images per batch, 1 color channel, and 28x28 images.
43+
44+
#%%
45+
plt.imshow(images[1].numpy().squeeze(), cmap='Greys_r');
46+
47+
#%% [markdown]
48+
# ## Building networks with PyTorch
49+
#
50+
# Here I'll use PyTorch to build a simple feedfoward network to classify the MNIST images. That is, the network will receive a digit image as input and predict the digit in the image.
51+
#
52+
# <img src="assets/mlp_mnist.png" width=600px>
53+
#
54+
# To build a neural network with PyTorch, you use the `torch.nn` module. The network itself is a class inheriting from `torch.nn.Module`. You define each of the operations separately, like `nn.Linear(784, 128)` for a fully connected linear layer with 784 inputs and 128 units.
55+
#
56+
# The class needs to include a `forward` method that implements the forward pass through the network. In this method, you pass some input tensor `x` through each of the operations you defined earlier. The `torch.nn` module also has functional equivalents for things like ReLUs in `torch.nn.functional`. This module is usually imported as `F`. Then to use a ReLU activation on some layer (which is just a tensor), you'd do `F.relu(x)`. Below are a few different commonly used activation functions.
57+
#
58+
# <img src="assets/activation.png" width=700px>
59+
#
60+
# So, for this network, I'll build it with three fully connected layers, then a softmax output for predicting classes. The softmax function is similar to the sigmoid in that it squashes inputs between 0 and 1, but it's also normalized so that all the values sum to one like a proper probability distribution.
61+
62+
#%%
63+
from torch import nn
64+
from torch import optim
65+
import torch.nn.functional as F
66+
67+
68+
#%%
69+
class Network(nn.Module):
70+
def __init__(self):
71+
super().__init__()
72+
# Defining the layers, 128, 64, 10 units each
73+
self.fc1 = nn.Linear(784, 128)
74+
self.fc2 = nn.Linear(128, 64)
75+
# Output layer, 10 units - one for each digit
76+
self.fc3 = nn.Linear(64, 10)
77+
78+
def forward(self, x):
79+
''' Forward pass through the network, returns the output logits '''
80+
81+
x = self.fc1(x)
82+
x = F.relu(x)
83+
x = self.fc2(x)
84+
x = F.relu(x)
85+
x = self.fc3(x)
86+
x = F.softmax(x, dim=1)
87+
88+
return x
89+
90+
model = Network()
91+
model
92+
93+
#%% [markdown]
94+
# ### Initializing weights and biases
95+
#
96+
# The weights and such are automatically initialized for you, but it's possible to customize how they are initialized. The weights and biases are tensors attached to the layer you defined, you can get them with `model.fc1.weight` for instance.
97+
98+
#%%
99+
print(model.fc1.weight)
100+
print(model.fc1.bias)
101+
102+
#%% [markdown]
103+
# For custom initialization, we want to modify these tensors in place. These are actually autograd *Variables*, so we need to get back the actual tensors with `model.fc1.weight.data`. Once we have the tensors, we can fill them with zeros (for biases) or random normal values.
104+
105+
#%%
106+
# Set biases to all zeros
107+
model.fc1.bias.data.fill_(0)
108+
109+
110+
#%%
111+
# sample from random normal with standard dev = 0.01
112+
model.fc1.weight.data.normal_(std=0.01)
113+
114+
#%% [markdown]
115+
# ### Forward pass
116+
#
117+
# Now that we have a network, let's see what happens when we pass in an image. This is called the forward pass. We're going to convert the image data into a tensor, then pass it through the operations defined by the network architecture.
118+
119+
#%%
120+
# Grab some data
121+
dataiter = iter(trainloader)
122+
images, labels = dataiter.next()
123+
124+
# Resize images into a 1D vector, new shape is (batch size, color channels, image pixels)
125+
images.resize_(64, 1, 784)
126+
# or images.resize_(images.shape[0], 1, 784) to not automatically get batch size
127+
128+
# Forward pass through the network
129+
img_idx = 0
130+
ps = model.forward(images[img_idx,:])
131+
132+
img = images[img_idx]
133+
helper.view_classify(img.view(1, 28, 28), ps)
134+
135+
#%% [markdown]
136+
# As you can see above, our network has basically no idea what this digit is. It's because we haven't trained it yet, all the weights are random!
137+
#
138+
# PyTorch provides a convenient way to build networks like this where a tensor is passed sequentially through operations, `nn.Sequential` ([documentation](https://pytorch.org/docs/master/nn.html#torch.nn.Sequential)). Using this to build the equivalent network:
139+
140+
#%%
141+
# Hyperparameters for our network
142+
input_size = 784
143+
hidden_sizes = [128, 64]
144+
output_size = 10
145+
146+
# Build a feed-forward network
147+
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
148+
nn.ReLU(),
149+
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
150+
nn.ReLU(),
151+
nn.Linear(hidden_sizes[1], output_size),
152+
nn.Softmax(dim=1))
153+
print(model)
154+
155+
# Forward pass through the network and display output
156+
images, labels = next(iter(trainloader))
157+
images.resize_(images.shape[0], 1, 784)
158+
ps = model.forward(images[0,:])
159+
helper.view_classify(images[0].view(1, 28, 28), ps)
160+
161+
#%% [markdown]
162+
# You can also pass in an `OrderedDict` to name the individual layers and operations. Note that a dictionary keys must be unique, so _each operation must have a different name_.
163+
164+
#%%
165+
from collections import OrderedDict
166+
model = nn.Sequential(OrderedDict([
167+
('fc1', nn.Linear(input_size, hidden_sizes[0])),
168+
('relu1', nn.ReLU()),
169+
('fc2', nn.Linear(hidden_sizes[0], hidden_sizes[1])),
170+
('relu2', nn.ReLU()),
171+
('output', nn.Linear(hidden_sizes[1], output_size)),
172+
('softmax', nn.Softmax(dim=1))]))
173+
model
174+
175+
#%% [markdown]
176+
# Now it's your turn to build a simple network, use any method I've covered so far. In the next notebook, you'll learn how to train a network so it can make good predictions.
177+
#
178+
# >**Exercise:** Build a network to classify the MNIST images with _three_ hidden layers. Use 400 units in the first hidden layer, 200 units in the second layer, and 100 units in the third layer. Each hidden layer should have a ReLU activation function, and use softmax on the output layer.
179+
180+
#%%
181+
## TODO: Your network here
182+
183+
184+
#%%
185+
## Run this cell with your model to make sure it works ##
186+
# Forward pass through the network and display output
187+
images, labels = next(iter(trainloader))
188+
images.resize_(images.shape[0], 1, 784)
189+
ps = model.forward(images[0,:])
190+
helper.view_classify(images[0].view(1, 28, 28), ps)
191+
192+

0 commit comments

Comments
 (0)