You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: B.PyTorch概览/PyTorch概览.md
+17-24Lines changed: 17 additions & 24 deletions
Original file line number
Diff line number
Diff line change
@@ -1,24 +1,23 @@
1
-
-[Grokking PyTorch](#grokking-pytorch)
2
-
-[Imports](#imports)
3
-
-[Setup](#setup)
4
-
-[Data](#data)
5
-
-[Model](#model)
6
-
-[Training](#training)
7
-
-[Testing](#testing)
8
-
-[Extra](#extra)
1
+
-[PyTorch 概览](#pytorch-概览)
2
+
-[导入相关包](#导入相关包)
3
+
-[配置](#配置)
4
+
-[数据](#数据)
5
+
-[模型](#模型)
6
+
-[训练](#训练)
7
+
-[测试](#测试)
8
+
-[其他](#其他)
9
9
10
10
11
-
Grokking PyTorch
12
-
================
11
+
#PyTorch 概览
12
+
13
13
14
14
[PyTorch](https://pytorch.org/) is a flexible deep learning framework that allows automatic differentiation through dynamic neural networks (i.e., networks that utilise dynamic control flow like if statements and while loops). It supports GPU acceleration, [distributed training](https://pytorch.org/docs/stable/distributed.html), [various optimisations](https://pytorch.org/2018/05/02/road-to-1.0.html), and plenty more neat features. These are some notes on how I think about using PyTorch, and don't encompass all parts of the library or every best practice, but may be helpful to others.
15
15
16
16
Neural networks are a subclass of *computation graphs*. Computation graphs receive input data, and data is routed to and possibly transformed by nodes which perform processing on the data. In deep learning, the neurons (nodes) in neural networks typically transform data with parameters and differentiable functions, such that the parameters can be optimised to minimise a loss via gradient descent. More broadly, the functions can be stochastic, and the structure of the graph can be dynamic. So while neural networks may be a good fit for [dataflow programming](https://en.wikipedia.org/wiki/Dataflow_programming), PyTorch's API has instead centred around [imperative programming](https://en.wikipedia.org/wiki/Imperative_programming), which is a more common way for thinking about programs. This makes it easier to read code and reason about complex programs, without necessarily sacrificing much performance; PyTorch is actually pretty fast, with plenty of optimisations that you can safely forget about as an end user (but you can dig in if you really want to).
17
17
18
18
The rest of this document, based on the [official MNIST example](https://github.com/pytorch/examples/tree/master/mnist), is about *grokking* PyTorch, and should only be looked at after the [official beginner tutorials](https://pytorch.org/tutorials/). For readability, the code is presented in chunks interspersed with comments, and hence not separated into different functions/files as it would normally be for clean, modular code.
19
19
20
-
Imports
21
-
-------
20
+
## 导入相关包
22
21
23
22
```py
24
23
import argparse
@@ -32,8 +31,7 @@ from torchvision import datasets, transforms
32
31
33
32
These are pretty standard imports, with the exception of the `torchvision` modules that are used for computer vision problems in particular.
@@ -68,8 +66,7 @@ A good way to write device-agnostic code (benefitting from GPU acceleration when
68
66
69
67
For repeatable experiments, it is necessary to set random seeds for anything that uses random number generation (including `random` or `numpy` if those are used too). Note that cuDNN uses nondeterministic algorithms, and it can be disabled using `torch.backends.cudnn.enabled = False`.
@@ -93,8 +90,7 @@ Since `torchvision` models get stored under `~/.torch/models/`, I like to store
93
90
94
91
`DataLoader` contains many options, but beyond `batch_size` and `shuffle`, `num_workers` and `pin_memory` are worth knowing for efficiency. `num_workers` > 0 uses subprocesses to asynchronously load data, rather than making the main process block on this. The typical use-case is when loading data (e.g. images) from disk and maybe transforming them too - this can be done in parallel with the network processing the data. You will want to tune the amount to a) minimise the number of workers and hence CPU and RAM usage (each worker loads a separate batch, not individual samples within a batch) b) minimise the time the network is waiting for data. `pin_memory` uses [pinned memory](https://pytorch.org/docs/master/notes/cuda.html#use-pinned-memory-buffers) (as opposed to paged memory) to speed up any RAM to GPU transfers (and does nothing for CPU-only code).
@@ -176,8 +171,7 @@ One way to cut the computation graph is to use `.detach()`, which you may use wh
176
171
177
172
Apart from logging results in the console/in a log file, it's important to checkpoint model parameters (and optimiser state) just in case. You can also use `torch.save()` to save normal Python objects, but other standard choices include the built-in `pickle`.
178
173
179
-
Testing
180
-
-------
174
+
## 测试
181
175
182
176
```py
183
177
model.eval()
@@ -201,8 +195,7 @@ In response to `.train()` earlier, networks should explicitly be set to evaluati
201
195
202
196
As mentioned previously, the computation graph would normally be made when using a network. By using the `no_grad` context manager via `with torch.no_grad()` this is prevented from happening.
203
197
204
-
Extra
205
-
-----
198
+
## 其他
206
199
207
200
This is an extra section just to add a few useful asides.
0 commit comments