Skip to content

Commit 6e7de3c

Browse files
committed
Add PyTorch概览
1 parent a667b06 commit 6e7de3c

File tree

1 file changed

+17
-24
lines changed

1 file changed

+17
-24
lines changed

Grokking PyTorch/Grokking PyTorch.md renamed to B.PyTorch概览/PyTorch概览.md

Lines changed: 17 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,23 @@
1-
- [Grokking PyTorch](#grokking-pytorch)
2-
- [Imports](#imports)
3-
- [Setup](#setup)
4-
- [Data](#data)
5-
- [Model](#model)
6-
- [Training](#training)
7-
- [Testing](#testing)
8-
- [Extra](#extra)
1+
- [PyTorch 概览](#pytorch-概览)
2+
- [导入相关包](#导入相关包)
3+
- [配置](#配置)
4+
- [数据](#数据)
5+
- [模型](#模型)
6+
- [训练](#训练)
7+
- [测试](#测试)
8+
- [其他](#其他)
99

1010

11-
Grokking PyTorch
12-
================
11+
# PyTorch 概览
12+
1313

1414
[PyTorch](https://pytorch.org/) is a flexible deep learning framework that allows automatic differentiation through dynamic neural networks (i.e., networks that utilise dynamic control flow like if statements and while loops). It supports GPU acceleration, [distributed training](https://pytorch.org/docs/stable/distributed.html), [various optimisations](https://pytorch.org/2018/05/02/road-to-1.0.html), and plenty more neat features. These are some notes on how I think about using PyTorch, and don't encompass all parts of the library or every best practice, but may be helpful to others.
1515

1616
Neural networks are a subclass of *computation graphs*. Computation graphs receive input data, and data is routed to and possibly transformed by nodes which perform processing on the data. In deep learning, the neurons (nodes) in neural networks typically transform data with parameters and differentiable functions, such that the parameters can be optimised to minimise a loss via gradient descent. More broadly, the functions can be stochastic, and the structure of the graph can be dynamic. So while neural networks may be a good fit for [dataflow programming](https://en.wikipedia.org/wiki/Dataflow_programming), PyTorch's API has instead centred around [imperative programming](https://en.wikipedia.org/wiki/Imperative_programming), which is a more common way for thinking about programs. This makes it easier to read code and reason about complex programs, without necessarily sacrificing much performance; PyTorch is actually pretty fast, with plenty of optimisations that you can safely forget about as an end user (but you can dig in if you really want to).
1717

1818
The rest of this document, based on the [official MNIST example](https://github.com/pytorch/examples/tree/master/mnist), is about *grokking* PyTorch, and should only be looked at after the [official beginner tutorials](https://pytorch.org/tutorials/). For readability, the code is presented in chunks interspersed with comments, and hence not separated into different functions/files as it would normally be for clean, modular code.
1919

20-
Imports
21-
-------
20+
## 导入相关包
2221

2322
```py
2423
import argparse
@@ -32,8 +31,7 @@ from torchvision import datasets, transforms
3231

3332
These are pretty standard imports, with the exception of the `torchvision` modules that are used for computer vision problems in particular.
3433

35-
Setup
36-
-----
34+
## 配置
3735

3836
```py
3937
parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
@@ -68,8 +66,7 @@ A good way to write device-agnostic code (benefitting from GPU acceleration when
6866

6967
For repeatable experiments, it is necessary to set random seeds for anything that uses random number generation (including `random` or `numpy` if those are used too). Note that cuDNN uses nondeterministic algorithms, and it can be disabled using `torch.backends.cudnn.enabled = False`.
7068

71-
Data
72-
----
69+
## 数据
7370

7471
```py
7572
data_path = os.path.join(os.path.expanduser('~'), '.torch', 'datasets', 'mnist')
@@ -93,8 +90,7 @@ Since `torchvision` models get stored under `~/.torch/models/`, I like to store
9390

9491
`DataLoader` contains many options, but beyond `batch_size` and `shuffle`, `num_workers` and `pin_memory` are worth knowing for efficiency. `num_workers` > 0 uses subprocesses to asynchronously load data, rather than making the main process block on this. The typical use-case is when loading data (e.g. images) from disk and maybe transforming them too - this can be done in parallel with the network processing the data. You will want to tune the amount to a) minimise the number of workers and hence CPU and RAM usage (each worker loads a separate batch, not individual samples within a batch) b) minimise the time the network is waiting for data. `pin_memory` uses [pinned memory](https://pytorch.org/docs/master/notes/cuda.html#use-pinned-memory-buffers) (as opposed to paged memory) to speed up any RAM to GPU transfers (and does nothing for CPU-only code).
9592

96-
Model
97-
-----
93+
## 模型
9894

9995
```py
10096
class Net(nn.Module):
@@ -140,8 +136,7 @@ def forward(self, x, hx, drop=False):
140136
return hx2
141137
```
142138

143-
Training
144-
--------
139+
## 训练
145140

146141
```py
147142
model.train()
@@ -176,8 +171,7 @@ One way to cut the computation graph is to use `.detach()`, which you may use wh
176171

177172
Apart from logging results in the console/in a log file, it's important to checkpoint model parameters (and optimiser state) just in case. You can also use `torch.save()` to save normal Python objects, but other standard choices include the built-in `pickle`.
178173

179-
Testing
180-
-------
174+
## 测试
181175

182176
```py
183177
model.eval()
@@ -201,8 +195,7 @@ In response to `.train()` earlier, networks should explicitly be set to evaluati
201195

202196
As mentioned previously, the computation graph would normally be made when using a network. By using the `no_grad` context manager via `with torch.no_grad()` this is prevented from happening.
203197

204-
Extra
205-
-----
198+
## 其他
206199

207200
This is an extra section just to add a few useful asides.
208201

0 commit comments

Comments
 (0)