60 分钟快速入门 PyTorch - 知乎 (zhihu.com)

# 第二章： `PyTorch` 之 `60min` 入门

# 什么是 `PyTorch` ?

PyTorch 是一个基于 Python 的科学计算包，主要定位两类人群：

NumPy 的替代品，可以利用 GPU 的性能进行计算。
深度学习研究平台拥有足够的灵活性和速度

# 开始学习

# Tensors (张量)

Tensors 类似于 NumPy 的 ndarrays ，同时 Tensors 可以使用 GPU 进行计算。

1 2	from __future__ import print_function import torch

# torch.empty(): 声明一个未初始化的矩阵。

1 2	x = torch.empty(5, 3) print(x)

输出:

tensor([[9.2737e-41, 8.9074e-01, 1.9286e-37],
     [1.7228e-34, 5.7064e+01, 9.2737e-41],
     [2.2803e+02, 1.9288e-37, 1.7228e-34],
     [1.4609e+04, 9.2737e-41, 5.8375e+04],
     [1.9290e-37, 1.7228e-34, 3.7402e+06]])

# torch.rand()：随机初始化一个矩阵

1 2	x = torch.rand(5, 3) print(x)

输出:

tensor([[ 0.6291,  0.2581,  0.6414],
  [ 0.9739,  0.8243,  0.2276],
  [ 0.4184,  0.1815,  0.5131],
  [ 0.5533,  0.5440,  0.0718],
  [ 0.2908,  0.1850,  0.5297]])

# 验证能否运行在 GPU

1	torch.cuda.is_available()

# torch.zeros()：创建数值皆为 0 的矩阵

Construct a matrix filled zeros and of dtype long:

1 2	x = torch.zeros(5, 3, dtype=torch.long) print(x)

输出:

tensor([[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0]])

# torch.tensor()：直接传递 tensor 数值来创建

1
2
3

tensor 数值是 [5.5 , 3]
x = torch.tensor([5.5, 3])
print(x)

输出:

tensor([ 5.5000, 3.0000])

除了上述几种方法，还可以根据已有的 tensor 变量创建新的 tensor 变量，这种做法的好处就是可以保留已有 tensor 的一些属性，包括尺寸大小、数值属性，除非是重新定义这些属性。相应的实现方法如下：

# tensor.new_ones()：new_() 方法需要输入尺寸大小

1
2
3

# 显示定义新的尺寸是 5*3，数值类型是 torch.double
tensor2 = tensor1.new_ones(5, 3, dtype=torch.double)  # new_* 方法需要输入 tensor 大小
print(tensor2)

输出结果：

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

# torch.randn_like(old_tensor)：保留相同的尺寸大小

1
2
3

# 修改数值类型
tensor3 = torch.randn_like(tensor2, dtype=torch.float)
print('tensor3: ', tensor3)

输出结果，这里是根据上个方法声明的 tensor2 变量来声明新的变量，可以看出尺寸大小都是 5*3，但是数值类型是改变了的。

tensor3:  tensor([[-0.4491, -0.2634, -0.0040],
        [-0.1624,  0.4475, -0.8407],
        [-0.6539, -1.2772,  0.6060],
        [ 0.2304,  0.0879, -0.3876],
        [ 1.2900, -0.7475, -1.8212]])

最后，对 tensors 的尺寸大小获取可以采用 tensor.size() 方法：

1 2	print(tensor3.size()) # 输出: torch.Size([5, 3])

# 获取它的维度信息:

1	print(x.size())

输出:

1	torch.Size([5, 3])

注意

注意： torch.Size 实际上是元组 (tuple) 类型，所以支持所有的元组操作。

# 操作

在接下来的例子中，我们将会看到加法操作。

# 加法

# + 运算符

1 2	y = torch.rand(5, 3) print(x + y)

Out:

tensor([[-0.1859,  1.3970,  0.5236],
     [ 2.3854,  0.0707,  2.1970],
     [-0.3587,  1.2359,  1.8951],
     [-0.1189, -0.1376,  0.4647],
     [-1.8968,  2.0164,  0.1092]])

# add

1	print(torch.add(x, y))

Out:

tensor([[-0.1859,  1.3970,  0.5236],
        [ 2.3854,  0.0707,  2.1970],
        [-0.3587,  1.2359,  1.8951],
        [-0.1189, -0.1376,  0.4647],
        [-1.8968,  2.0164,  0.1092]])

# result 提供一个输出

1
2
3

result = torch.empty(5, 3)
torch.add(x, y, out=result) #x+y 结果储存在result中
print(result)

Out:

tensor([[-0.1859,  1.3970,  0.5236],
     [ 2.3854,  0.0707,  2.1970],
     [-0.3587,  1.2359,  1.8951],
     [-0.1189, -0.1376,  0.4647],
     [-1.8968,  2.0164,  0.1092]])

# add_ 直接修改变量

1
2
3

# adds x to y
y.add_(x)
print(y)

Out:

tensor([[-0.1859,  1.3970,  0.5236],
     [ 2.3854,  0.0707,  2.1970],
     [-0.3587,  1.2359,  1.8951],
     [-0.1189, -0.1376,  0.4647],
     [-1.8968,  2.0164,  0.1092]])

Note：

注意任何使张量会发生变化的操作都有一个前缀 '_'。例如： x.copy_(y) , x.t_() , 将会改变 x .

# 对于 Tensor 的访问

除了加法运算操作，，和 Numpy 对数组类似，可以使用索引来访问某一维的数据，如下所示：

# 索引操作

1 2	# 访问 tensor3 第一列数据 print(x[:, 1])

Out:

1	tensor([ 0.4477, -0.0048, 1.0878, -0.2174, 1.3609])

# torch.view ()：对 Tensor 的尺寸修改

如果你想改变一个 tensor 的大小或者形状，你可以使用 torch.view :

x = torch.randn(4, 4)
y = x.view(16)  # 1*16
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

Out:

1	torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])

-1 用于 view 方法中作为一个特殊的参数值，表示自动计算该维度的大小。当你重新调整一个张量的形状时， -1 将会被替换为一个值，这个值是根据张量的总元素数和其他维度的大小自动计算出来的，以保证新形状的元素总数与原张量相同。

总数不变

# .item()

如果你有一个元素 tensor ，使用 .item () 来获得这个 value 。如果 tensor 仅有一个元素，可以采用 .item() 来获取类似 Python 中整数类型的数值：

1
2
3

x = torch.randn(1)
print(x)
print(x.item())

Out:

1 2	tensor([ 0.9422]) 0.9422121644020081

更多运算操作请看文档

torch — PyTorch 2.2 documentation

# 和 Numpy 数组的转换

Tensor 和 Numpy 的数组可以相互转换，并且两者转换后共享在 CPU 下的内存空间，即改变其中一个的数值，另一个变量也会随之改变。

# Tensor 转换为 Numpy 数组

实现 Tensor 转换为 Numpy 数组的例子如下所示，调用 tensor.numpy() 可以实现这个转换操作。

a = torch.ones(5)
print(a)
b = a.numpy()
print(b)

输出结果：

1 2	tensor([1., 1., 1., 1., 1.]) [1. 1. 1. 1. 1.]

# Numpy 数组转换为 Tensor

转换的操作是调用 torch.from_numpy(numpy_array) 方法。例子如下所示：

import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

输出结果：

1 2	[2. 2. 2. 2. 2.] tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

在 CPU 上，除了 CharTensor 外的所有 Tensor 类型变量，都支持和 Numpy 数组的相互转换操作。

# CUDA 张量

Tensors 可以通过 .to 方法转换到不同的设备上，即 CPU 或者 GPU 上。

例子：

# 当 CUDA 可用的时候，可用运行下方这段代码，采用 torch.device() 方法来改变 tensors 是否在 GPU 上进行计算操作
if torch.cuda.is_available():
    device = torch.device("cuda")          # 定义一个 CUDA 设备对象
    y = torch.ones_like(x, device=device)  # 显示创建在 GPU 上的一个 tensor
    x = x.to(device)                       # 也可以采用 .to("cuda") 
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # .to() 方法也可以改变数值类型

输出结果，第一个结果就是在 GPU 上的结果，打印变量的时候会带有 device='cuda:0' ，而第二个是在 CPU 上的变量。

1 2	tensor([1.4549], device='cuda:0') tensor([1.4549], dtype=torch.float64)

本小节教程：

https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html

本小节的代码：

https://github.com/ccc013/DeepLearning_Notes/blob/master/Pytorch/practise/basic_practise.ipynb

# autograd

对于 Pytorch 的神经网络来说，非常关键的一个库就是 autograd ，

提供了对 Tensors 上所有运算操作的自动微分功能，也就是计算梯度的功能。

它属于 define-by-run 类型框架，即反向传播操作的定义是根据代码的运行方式，因此每次迭代都可以是不同的。

# 张量

torch.Tensor 是 Pytorch 最主要的库，当设置它的属性 .requires_grad=True ，那么就会开始追踪在该变量上的所有操作，而完成计算后，可以调用 .backward() 并自动计算所有的梯度，得到的梯度都保存在属性 .grad 中。

调用 .detach() 方法分离出计算的历史，可以停止一个 tensor 变量继续追踪其历史信息，同时也防止未来的计算会被追踪。

使用 with torch.no_grad(): 就是告诉 PyTorch：“现在我只想用模型来做一些前向计算，不需要做梯度更新，请暂时不要保存那些用于梯度更新所必需的信息，以节省计算资源和内存”。这样做可以让模型运行得更快，同时消耗更少的资源。

# Function

对于 autograd 的实现，还有一个类也是非常重要 Function 。

Tensor 和 Function 两个类是有关联并建立了一个非循环的图，可以编码一个完整的计算记录。每个 tensor 变量都带有属性 .grad_fn ，该属性引用了创建了这个变量的 Function （除了由用户创建的 Tensors，它们的 grad_fn=None )。

<details>
<summary>grad_fn</summary>
在深度学习中，模型训练的一个重要步骤是计算损失函数（即模型输出与真实值之间的差距）关于模型参数的梯度（或导数），然后根据这些梯度来更新模型参数，以使损失函数的值减小。这个过程称为梯度下降。PyTorch 通过建立一个计算图来帮助实现这个过程，而这个计算图是由 Tensor 和 Function 这两个类的实例组成的。
Tensor
在 PyTorch 中，Tensor 是一个多维数组，用于存储模型的输入数据、参数、输出数据以及计算过程中的各种中间数据。每个 Tensor 都可以跟踪它是如何被创建的 —— 即它是通过什么样的操作从其他 Tensor 转换而来的。Function
每个操作，不管是简单的数学运算还是复杂的神经网络层操作，都可以看作是一个 Function。这些 Function 不仅执行计算，还记录了计算的细节，以便于后续进行梯度的反向传播。
计算图
当你在 PyTorch 中执行操作时，你实际上是在构建一个计算图。这个图是由节点（Tensor）和边（Function，表示操作）组成的。这个图是向前构建的：从输入 Tensor 开始，通过各种操作，最终到达输出 Tensor。这个过程称为前向传播。
.grad_fn 属性
每个 Tensor 都有一个.grad_fn 属性，这个属性是一个指向 Function 的引用，即这个 Tensor 是通过哪个 Function 计算得到的。如果这个 Tensor 是直接由用户创建的（不是通过某些操作得到的），那么它的.grad_fn 就是 None，因为它不是通过计算得到的。
非循环图
这个计算图是非循环的，意味着数据流是有方向的，从输入流向输出，不会有任何循环或回路。这使得在图中进行前向传播和反向传播（用于计算梯度）变得简单明了。

为什么这很重要？
当进行反向传播以计算梯度时，PyTorch 会沿着这个图从输出向后逐步移动，使用链式法则自动计算每个参数的梯度。这个过程完全自动化，用户不需要手动编写梯度计算代码，极大地简化了深度学习模型的训练过程。

简而言之， Tensor 和 Function 通过计算图相互关联，这个图能够追踪整个计算过程，为自动梯度计算（自动微分）提供支持，使得深度学习模型的训练变得更加高效和简单。

如果要进行求导运算，可以调用一个 Tensor 变量的方法 .backward() 。如果该变量是一个标量，即仅有一个元素，那么不需要传递任何参数给方法 .backward() ，当包含多个元素的时候，就必须指定一个 gradient 参数，表示匹配尺寸大小的 tensor，这部分见第二小节介绍梯度的内容。

接下来就开始用代码来进一步介绍。

首先导入必须的库：

1	import torch

开始创建一个 tensor，并让 requires_grad=True 来追踪该变量相关的计算操作：

1 2	x = torch.ones(2, 2, requires_grad=True) print(x)

输出结果：

1 2	tensor([[1., 1.], [1., 1.]], requires_grad=True)

执行任意计算操作，这里进行简单的加法运算：

1 2	y = x + 2 print(y)

输出结果：

1 2	tensor([[3., 3.], [3., 3.]], grad_fn=<AddBackward>)

y 是一个操作的结果，所以它带有属性 grad_fn ：

1	print(y.grad_fn)

输出结果：

1	<AddBackward object at 0x00000216D25DCC88>

继续对变量 y 进行操作：

z = y * y * 3
out = z.mean()

print('z=', z)
print('out=', out)

输出结果：

z= tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward>)

out= tensor(27., grad_fn=<MeanBackward1>)

实际上，一个 Tensor 变量的默认 requires_grad 是 False ，可以像上述定义一个变量时候指定该属性是 True ，当然也可以定义变量后，调用 .requires_grad_(True) 设置为 True ，这里带有后缀 _ 是会改变变量本身的属性，在上一节介绍加法操作 add_() 说明过

代码例子：

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

输出结果如下，第一行是为设置 requires_grad 的结果，接着显示调用 .requires_grad_(True) ，输出结果就是 True 。

False

True

<SumBackward0 object at 0x00000216D25ED710>

# 梯度

接下来就是开始计算梯度，进行反向传播的操作。 out 变量是上一小节中定义的，它是一个标量，因此 out.backward() 相当于 out.backward(torch.tensor(1.)) ，

代码如下：

1
2
3

out.backward()
# 输出梯度 d(out)/dx
print(x.grad)

输出结果：

1 2	tensor([[4.5000, 4.5000], [4.5000, 4.5000]])

结果应该就是得到数值都是 4.5 的矩阵。这里我们用 o 表示 out 变量，那么根据之前的定义会有：

$O = \frac{1}{4} \sum_i z_i,$

$z_i = 3(x_i + 2)^2,$

$z_i \big|_{x_i=1} = 27$

详细来说，初始定义的 x 是一个全为 1 的矩阵，然后加法操作 x+2 得到 y ，接着 y*y*3 ，得到 z ，并且此时 z 是一个 2*2 的矩阵，所以整体求平均得到 out 变量应该是除以 4，所以得到上述三条公式。

因此，计算梯度：

$\frac{\partial o}{\partial x_i} = \frac{3}{2} (x_i + 2),$

$\left.\frac{\partial o}{\partial x_i}\right|_{x_i=1} = \frac{9}{2} = 4.5$

从数学上来说，如果你有一个向量值函数：

$\hat{y} = f(\vec{x})$

那么对应的梯度是一个雅克比矩阵 (Jacobian matrix)：

一般来说， torch.autograd 就是用于计算雅克比向量 (vector-Jacobian) 乘积的工具。这里略过数学公式，直接上代码例子介绍：

x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

# 神经网络

在 PyTorch 中 torch.nn 专门用于实现神经网络。其中 nn.Module 包含了网络层的搭建，以及一个方法 -- forward(input) ，并返回网络的输出 output .

下面是一个经典的 LeNet 网络，用于对字符进行分类。

对于神经网络来说，一个标准的训练流程是这样的：

定义一个多层的神经网络
对数据集的预处理并准备作为网络的输入
将数据输入到网络
计算网络的损失
反向传播，计算梯度
更新网络的梯度，一个简单的更新规则是 weight = weight - learning_rate * gradient

# 定义网络

首先定义一个神经网络，下面是一个 5 层的卷积神经网络，包含两层卷积层和三层全连接层：

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
         # 输入图像是单通道，conv1 kenrnel size=5*5，输出通道 6
        self.conv1 = nn.Conv2d(1 ,6 ,5 )
        # conv2 kernel size=5*5, 输出通道 16
        #全连接层
        self.fc1 = nn.Linear(16*5*5 , 120)
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84,10)
        
    def forward(self,x):
        # max-pooling 采用一个 (2,2) 的滑动窗口
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
         # 核(kernel)大小是方形的话，可仅定义一个数字，如 (2,2) 用 2 即可
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        # 除了 batch 维度外的所有维度
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
print(net)
打印网络结构：

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

打印网络结构：

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

这里必须实现 forward 函数，而 backward 函数在采用 autograd 时就自动定义好了，在 forward 方法可以采用任何的张量操作。

net.parameters() 可以返回网络的训练参数，使用例子如下：

params = list(net.parameters())
print('参数数量: ', len(params))
# conv1.weight
print('第一个参数大小: ', params[0].size())

输出：

1 2	参数数量: 10 第一个参数大小: torch.Size([6, 1, 5, 5])

然后简单测试下这个网络，随机生成一个 32*32 的输入：

# 随机定义一个变量输入网络
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

输出结果：

1 2	tensor([[ 0.1005, 0.0263, 0.0013, -0.1157, -0.1197, -0.0141, 0.1425, -0.0521, 0.0689, 0.0220]], grad_fn=<ThAddmmBackward>)

接着反向传播需要先清空梯度缓存，并反向传播随机梯度：

1
2
3

# 清空所有参数的梯度缓存，然后计算随机梯度进行反向传播
net.zero_grad()
out.backward(torch.randn(1, 10))

注意：

torch.nn 只支持 ** 小批量 (mini-batches)** 数据，也就是输入不能是单个样本，比如对于 nn.Conv2d 接收的输入是一个 4 维张量 -- nSamples * nChannels * Height * Width 。
所以，如果你输入的是单个样本，需要采用 **input.unsqueeze(0)** 来扩充一个假的 batch 维度，即从 3 维变为 4 维。

# 损失函数

损失函数的输入是 (output, target) ，即网络输出和真实标签对的数据，然后返回一个数值表示网络输出和真实标签的差距。

PyTorch 中其实已经定义了不少的损失函数，这里仅采用简单的均方误差： nn.MSELoss ，例子如下：

output = net(input)
# 定义伪标签
target = torch.randn(10)
# 调整大小，使得和 output 一样的 size
target = target.view(1, -1)
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

输出如下：

1	tensor(0.6524, grad_fn=<MseLossBackward>)

这里，整个网络的数据输入到输出经历的计算图如下所示，其实也就是数据从输入层到输出层，计算 loss 的过程。

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss

如果调用 loss.backward() ，那么整个图都是可微分的，也就是说包括 loss ，图中的所有张量变量，只要其属性 requires_grad=True ，那么其梯度 .grad 张量都会随着梯度一直累计。

用代码来说明：

# MSELoss
print(loss.grad_fn)
# Linear layer
print(loss.grad_fn.next_functions[0][0])
# Relu
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])

输出：

<MseLossBackward object at 0x0000019C0C349908>

<ThAddmmBackward object at 0x0000019C0C365A58>

<ExpandBackward object at 0x0000019C0C3659E8>

# 反向传播

反向传播的实现只需要调用 loss.backward() 即可，当然首先需要清空当前梯度缓存，即 .zero_grad() 方法，否则之前的梯度会累加到当前的梯度，这样会影响权值参数的更新。

下面是一个简单的例子，以 conv1 层的偏置参数 bias 在反向传播前后的结果为例：

# 清空所有参数的梯度缓存
net.zero_grad()
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

输出结果：

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])

conv1.bias.grad after backward
tensor([ 0.0069,  0.0021,  0.0090, -0.0060, -0.0008, -0.0073])

了解更多有关 torch.nn 库，可以查看官方文档：

https://pytorch.org/docs/stable/nn.html

# 更新权重

采用随机梯度下降 (Stochastic Gradient Descent, SGD) 方法的最简单的更新权重规则如下：

1	weight = weight - learning_rate * gradient

按照这个规则，代码实现如下所示：

# 简单实现权重的更新例子
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

但是这只是最简单的规则，深度学习有很多的优化算法，不仅仅是 SGD ，还有 Nesterov-SGD, Adam, RMSProp 等等，为了采用这些不同的方法，这里采用 torch.optim 库，使用例子如下所示：

import torch.optim as optim
# 创建优化器
optimizer = optim.SGD(net.parameters(), lr=0.01)

# 在训练过程中执行下列操作
optimizer.zero_grad() # 清空梯度缓存
output = net(input)
loss = criterion(output, target)
loss.backward()
# 更新权重
optimizer.step()

注意，同样需要调用 optimizer.zero_grad() 方法清空梯度缓存。

本小节教程：

https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html

本小节的代码：

https://github.com/ccc013/DeepLearning_Notes/blob/master/Pytorch/practise/neural_network.ipynb

# 训练分类器

上一节介绍了如何构建神经网络、计算 loss 和更新网络的权值参数，接下来需要做的就是实现一个图片分类器。

# 训练数据

在训练分类器前，当然需要考虑数据的问题。通常在处理如图片、文本、语音或者视频数据的时候，一般都采用标准的 Python 库将其加载并转成 Numpy 数组，然后再转回为 PyTorch 的张量。

对于图像，可以采用 Pillow, OpenCV 库；
对于语音，有 scipy 和 librosa ;
对于文本，可以选择原生 Python 或者 Cython 进行加载数据，或者使用 NLTK 和 SpaCy 。

PyTorch 对于计算机视觉，特别创建了一个 torchvision 的库，它包含一个数据加载器 (data loader)，可以加载比较常见的数据集，比如 Imagenet, CIFAR10, MNIST 等等，然后还有一个用于图像的数据转换器 (data transformers)，调用的库是 torchvision.datasets 和 torch.utils.data.DataLoader 。

在本教程中，将采用 CIFAR10 数据集，它包含 10 个类别，分别是飞机、汽车、鸟、猫、鹿、狗、青蛙、马、船和卡车。数据集中的图片都是 3x32x32 。一些例子如下所示：

# 训练图片分类器

训练流程如下：

通过调用 torchvision 加载和归一化 CIFAR10 训练集和测试集；
构建一个卷积神经网络；
定义一个损失函数；
在训练集上训练网络；
在测试集上测试网络性能。

# 加载和归一化 CIFAR10

首先导入必须的包：

1
2
3

import torch
import torchvision
import torchvision.transforms as transforms

torchvision 的数据集输出的图片都是 PILImage ，即取值范围是 [0, 1] ，这里需要做一个转换，变成取值范围是 [-1, 1] ,

代码如下所示：

# 将图片数据从 [0,1] 归一化为 [-1, 1] 的取值范围
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

这里下载好数据后，可以可视化部分训练图片，代码如下：

import matplotlib.pyplot as plt
import numpy as np

# 展示图片的函数
def imshow(img):
    img = img / 2 + 0.5     # 非归一化
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# 随机获取训练集图片
dataiter = iter(trainloader)
images, labels = dataiter.next()

# 展示图片
imshow(torchvision.utils.make_grid(images))
# 打印图片类别标签
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

展示图片如下所示：

其类别标签为：

1	frog plane dog ship

# 构建一个卷积神经网络

这部分内容其实直接采用上一节定义的网络即可，除了修改 conv1 的输入通道，从 1 变为 3，因为这次接收的是 3 通道的彩色图片。

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

# 定义损失函数和优化器

这里采用类别交叉熵函数和带有动量的 SGD 优化方法：

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# 训练网络

第四步自然就是开始训练网络，指定需要迭代的 epoch，然后输入数据，指定次数打印当前网络的信息，比如 loss 或者准确率等性能评价标准。

import time
start = time.time()
for epoch in range(2):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # 获取输入数据
        inputs, labels = data
        # 清空梯度缓存
        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # 打印统计信息
        running_loss += loss.item()
        if i % 2000 == 1999:
            # 每 2000 次迭代打印一次信息
            print('[%d, %5d] loss: %.3f' % (epoch + 1, i+1, running_loss / 2000))
            running_loss = 0.0
print('Finished Training! Total cost time: ', time.time()-start)

这里定义训练总共 2 个 epoch，训练信息如下，大概耗时为 77s。

[1,  2000] loss: 2.226
[1,  4000] loss: 1.897
[1,  6000] loss: 1.725
[1,  8000] loss: 1.617
[1, 10000] loss: 1.524
[1, 12000] loss: 1.489
[2,  2000] loss: 1.407
[2,  4000] loss: 1.376
[2,  6000] loss: 1.354
[2,  8000] loss: 1.347
[2, 10000] loss: 1.324
[2, 12000] loss: 1.311

Finished Training! Total cost time:  77.24696755409241

# 测试模型性能

训练好一个网络模型后，就需要用测试集进行测试，检验网络模型的泛化能力。对于图像分类任务来说，一般就是用准确率作为评价标准。

首先，我们先用一个 batch 的图片进行小小测试，这里 batch=4 ，也就是 4 张图片，代码如下：

dataiter = iter(testloader)
images, labels = dataiter.next()

# 打印图片
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

图片和标签分别如下所示：

1	GroundTruth: cat ship ship plane

然后用这四张图片输入网络，看看网络的预测结果：

# 网络输出
outputs = net(images)

# 预测结果
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))

输出为：

1	Predicted: cat ship ship ship

前面三张图片都预测正确了，第四张图片错误预测飞机为船。

接着，让我们看看在整个测试集上的准确率可以达到多少吧！

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))

输出结果如下

1	Accuracy of the network on the 10000 test images: 55 %

这里可能准确率并不一定一样，教程中的结果是 51% ，因为权重初始化问题，可能多少有些浮动，相比随机猜测 10 个类别的准确率 (即 10%)，这个结果是不错的，当然实际上是非常不好，不过我们仅仅采用 5 层网络，而且仅仅作为教程的一个示例代码。

然后，还可以再进一步，查看每个类别的分类准确率，跟上述代码有所不同的是，计算准确率部分是 c = (predicted == labels).squeeze() ，这段代码其实会根据预测和真实标签是否相等，输出 1 或者 0，表示真或者假，因此在计算当前类别正确预测数

量时候直接相加，预测正确自然就是加 1，错误就是加 0，也就是没有变化。

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (classes[i], 100 * class_correct[i] / class_total[i]))

输出结果，可以看到猫、鸟、鹿是错误率前三，即预测最不准确的三个类别，反倒是船和卡车最准确。

Accuracy of plane : 58 %
Accuracy of   car : 59 %
Accuracy of  bird : 40 %
Accuracy of   cat : 33 %
Accuracy of  deer : 39 %
Accuracy of   dog : 60 %
Accuracy of  frog : 54 %
Accuracy of horse : 66 %
Accuracy of  ship : 70 %
Accuracy of truck : 72 %

# 在 GPU 上训练

深度学习自然需要 GPU 来加快训练速度的。所以接下来介绍如果是在 GPU 上训练，应该如何实现。

首先，需要检查是否有可用的 GPU 来训练，代码如下：

1 2	device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") print(device)

输出结果如下，这表明你的第一块 GPU 显卡或者唯一的 GPU 显卡是空闲可用状态，否则会打印 cpu 。

cuda:0

既然有可用的 GPU ，接下来就是在 GPU 上进行训练了，其中需要修改的代码如下，分别是需要将网络参数和数据都转移到 GPU 上：

1 2	net.to(device) inputs, labels = inputs.to(device), labels.to(device)

修改后的训练部分代码：

import time
# 在 GPU 上训练注意需要将网络和数据放到 GPU 上
net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

start = time.time()
for epoch in range(2):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # 获取输入数据
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        # 清空梯度缓存
        optimizer.zero_grad()

        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # 打印统计信息
        running_loss += loss.item()
        if i % 2000 == 1999:
            # 每 2000 次迭代打印一次信息
            print('[%d, %5d] loss: %.3f' % (epoch + 1, i+1, running_loss / 2000))
            running_loss = 0.0
print('Finished Training! Total cost time: ', time.time() - start)

注意，这里调用 net.to(device) 后，需要定义下优化器，即传入的是 CUDA 张量的网络参数。训练结果和之前的类似，而且其实因为这个网络非常小，转移到 GPU 上并不会有多大的速度提升，而且我的训练结果看来反而变慢了，也可能是因为我的笔记本的 GPU 显卡问题。

如果需要进一步提升速度，可以考虑采用多 GPUs，也就是下一节的内容。

本小节教程：

https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

本小节的代码：

https://github.com/ccc013/DeepLearning_Notes/blob/master/Pytorch/practise/train_classifier_example.ipynb

# 数据并行

这部分教程将学习如何使用 DataParallel 来使用多个 GPUs 训练网络。

首先，在 GPU 上训练模型的做法很简单，如下代码所示，定义一个 device 对象，然后用 .to() 方法将网络模型参数放到指定的 GPU 上。

1 2	device = torch.device("cuda:0") model.to(device)

接着就是将所有的张量变量放到 GPU 上：

1	mytensor = my_tensor.to(device)

注意，这里 my_tensor.to(device) 是返回一个 my_tensor 的新的拷贝对象，而不是直接修改 my_tensor 变量，因此你需要将其赋值给一个新的张量，然后使用这个张量。

Pytorch 默认只会采用一个 GPU，因此需要使用多个 GPU，需要采用 DataParallel ，代码如下所示：

1	model = nn.DataParallel(model)

这代码也就是本节教程的关键，接下来会继续详细介绍。

# 导入和参数

首先导入必须的库以及定义一些参数：

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# Parameters and DataLoaders
input_size = 5
output_size = 2

batch_size = 30
data_size = 100

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

这里主要定义网络输入大小和输出大小， batch 以及图片的大小，并定义了一个 device 对象。

# 构建一个假数据集

接着就是构建一个假的 (随机) 数据集。实现代码如下：

class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len

rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
                         batch_size=batch_size, shuffle=True)

# 简单的模型

接下来构建一个简单的网络模型，仅仅包含一层全连接层的神经网络，加入 print() 函数用于监控网络输入和输出 tensors 的大小：

class Model(nn.Module):
    # Our model

    def __init__(self, input_size, output_size):
        super(Model, self).__init__()
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, input):
        output = self.fc(input)
        print("\tIn Model: input size", input.size(),
              "output size", output.size())

        return output

# 创建模型和数据平行

这是本节的核心部分。首先需要定义一个模型实例，并且检查是否拥有多个 GPUs，如果是就可以将模型包裹在 nn.DataParallel ，并调用 model.to(device) 。代码如下：

model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
  print("Let's use", torch.cuda.device_count(), "GPUs!")
  # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
  model = nn.DataParallel(model)

model.to(device)

# 运行模型

接着就可以运行模型，看看打印的信息：

for data in rand_loader:
    input = data.to(device)
    output = model(input)
    print("Outside: input size", input.size(),
          "output_size", output.size())

输出如下：

In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
        In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
        In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
        In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

# 运行结果

如果仅仅只有 1 个或者没有 GPU ，那么 batch=30 的时候，模型会得到输入输出的大小都是 30。但如果有多个 GPUs，那么结果如下：

# 2 GPUs

# on 2 GPUs
Let's use 2 GPUs!
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    In Model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
    In Model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

# 3 GPUs

Let's use 3 GPUs!
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
    In Model: input size torch.Size([10, 5]) output size torch.Size([10, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

# 8 GPUs

Let's use 8 GPUs!
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([4, 5]) output size torch.Size([4, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([30, 5]) output_size torch.Size([30, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
    In Model: input size torch.Size([2, 5]) output size torch.Size([2, 2])
Outside: input size torch.Size([10, 5]) output_size torch.Size([10, 2])

# 总结

DataParallel 会自动分割数据集并发送任务给多个 GPUs 上的多个模型。然后等待每个模型都完成各自的工作后，它又会收集并融合结果，然后返回。

更详细的数据并行教程：

https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html

本小节教程：

https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html

# 小结

教程从最基础的张量开始介绍，然后介绍了非常重要的自动求梯度的 autograd ，接着介绍如何构建一个神经网络，如何训练图像分类器，最后简单介绍使用多 GPUs 加快训练速度的方法。

快速入门教程就介绍完了，接下来你可以选择：

训练一个神经网络来玩视频游戏
在 imagenet 上训练 ResNet
采用 GAN 训练一个人脸生成器
采用循环 LSTM 网络训练一个词语级别的语言模型
更多的例子
更多的教程
在 Forums 社区讨论 PyTorch

# 项目练习：手写数字识别练习 MNIST

softmax 归一化

梯度下降法等调参

一批次一批次的训练：一个批次一个 batch

神经网络过程是线性的，需要非线性结果

在每个节点上再套上一个非线性函数 f (), 又称激活函数

$x_j^{k+1} = \sum_i f\left(a_{i,j}^k \cdot x_i^k + b_{i,j}^k\right)$

同时安装四个库

1	pip install numpy torch torchvision matplotlib

手写数字识别练习

#定义一个神经网络
class Net(torch.nn.Module):
	
	def __init__(self):
		#神经网络主体，包含四个全连接层
		super().__init__()
		self.fc1 = torch.nn.Linear(28*28,64) #输入为28*28像素尺寸的图像
		self.fc2 = torch.nn.Linear(64,64)
		self.fc3 = torch.nn.Linear(64,64)
		self.fc4 = torch.nn.Linear(64,10)
	#中间3层都放了64个节点，输出为10个数字类别
	
	def forward(self,x): #x图像输入
		x = torch.nn.functional.relu(self.fc1(x) )
		x = torch.nn.functional.relu(self.fc2(x) )
		x = torch.nn.functional.relu(self.fc3(x) )
		x = torch.nn.functional.log_softmax(self.fc3(x),dim=1)#提高计算稳定性
		return x
	
	#导入数据
	def get_data_loader(is_train):
		#定义数据转换类型
		to_tensor = transforms.Compose([transforms,ToTensor()])
		data_set = MNIST("",is_train,transform=to_tensor,download = True)
		return DataLoader(data_set,batch_size=15,shuffle=True)
	
	def evaluate(test_data,net):
		n_correct = 0
		n_total = 0
		with torch.no_grad(): 	#从测试集中按批次取出数据
			for(x,y) in test_data:
				outputs = net.forward(x,view(-1,28*28))
				for i,output in enumerate(outputs) :
					if torch.argmax(output) == y[i] :
						n_correct += 1
					n_total += 1
		return n_correct / n_total #返回正确率
	
	def main():
		
		train_data = get_data_loader(is_train=True)
		test_data = get_data_loader(is_train=False)
		net = Net()
		
		print("initial accuracy:",evaluate(test_data,net))
		#训练model Pytorch固定写法
		optimizer = torch.optim.Adam(net.parameters(),lr=0.001)
		for epoch in range(2):
			for(x,y) in train_data:
				net.zero_grad() #初始化
				output = net.forward(x,view(-1,28*28 )) #正向传播
				loss = torch.nn.functional.nll_loss(output,y) #计算差值
				loss.backward() #反向误差传播
				optimizer.step() #优化网络参数
				#
			print("epoch",epoch,"accuracy:",evaluate(test_data,net))
		#如果一切正常，训练率会越来越高
		
		#训练完成后，随机选取三张图像，显示网络预测结果
        for (n, (x, _ )) in enumerate(test_data):
        	if n > 3:
        		break
        	predict = torch.argmax(net.forward(x[0].view(-1,28*28)) )
        	plt.figure(n)
        	plt.imshow(x[0].view(28,28))
        	plt.title("prediction: "+str(int(predict) ) )
   		plt.show()
	
	if __name__ == "__main__":
		main()

注释版

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        # 定义第一个卷积层，输入通道为1（单通道图像，如灰度图），输出通道为6，使用5*5的卷积核
        self.conv1 = nn.Conv2d(1, 6, 5)
        # 第二个卷积层，输入通道为6（由于第一个卷积层的输出是6），输出通道为16，同样使用5*5的卷积核
        # 注意：这里的定义缺失了，应该在`__init__`方法中添加self.conv2的定义
        # 全连接层（fc1）的定义，输入特征维度为16*5*5（假设经过两次卷积和池化后的特征图大小），输出特征维度为120
        self.fc1 = nn.Linear(16*5*5, 120)
        # 第二个全连接层，输入特征维度为120，输出特征维度为84
        self.fc2 = nn.Linear(120, 84)
        # 第三个全连接层，输入特征维度为84，输出特征维度为10（假设为分类问题的类别数）
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x):
        # 应用第一个卷积层后使用ReLU激活函数，然后进行2x2的最大池化
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # 应用第二个卷积层，同样使用ReLU激活函数和2x2的最大池化
        # 注意：这里需要确保conv2在`__init__`方法中被定义
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        # 将多维输入张量展平成一维，准备输入到全连接层
        x = x.view(-1, self.num_flat_features(x))
        # 第一个全连接层后使用ReLU激活函数
        x = F.relu(self.fc1(x))
        # 第二个全连接层同样使用ReLU激活函数
        x = F.relu(self.fc2(x))
        # 最后一个全连接层输出最终结果，这里不使用激活函数是因为后续可能接softmax进行分类
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        # 计算除batch维度外的所有维度乘积，即在全连接层之前需要展平的特征数量
        size = x.size()[1:]  # 所有维度除了batch维度
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
print(net)

这段代码定义了一个简单的卷积神经网络，包含两个卷积层和三个全连接层。它演示了在 PyTorch 中如何构建网络、应用卷积、激活函数、池化以及全连接层。注意，代码中确实漏掉了 self.conv2 的定义，这是必须添加的部分以确保网络能够正常工作。

forward 前向传输
全连接线性运算 self.fc1 (x) 再套上激活函数 x 为图像输入

第一个参数表示下载目录，"" 空表示当前目录

is_train 用于指定导入训练集还是测试集

batch_size=15 表示一个批次包含 15 张图片

shuffle=True 表示打乱顺序

返回数据加载器 DataLoader

evaluate 函数用来评估神经网络的识别正确率

从测试集中按批次取出数据，计算神经网路的预测值

再对批次中的每个结果进行比较，累加正确预测的数量

nll_loss 对数损失函数

是 log_softmax 中的对数运算

epoch 训练伦次，提高数据利用率

PyTorch学习

# 第二章： PyTorch 之 60min 入门

# 什么是 PyTorch ?

# 开始学习

# Tensors (张量)

# torch.empty(): 声明一个未初始化的矩阵。

# torch.rand()：随机初始化一个矩阵

# 验证能否运行在 GPU

# torch.zeros()：创建数值皆为 0 的矩阵

# torch.tensor()：直接传递 tensor 数值来创建

# tensor.new_ones()：new_() 方法需要输入尺寸大小

# torch.randn_like(old_tensor)：保留相同的尺寸大小

# 获取它的维度信息:

# 操作

# 加法

# + 运算符

# add

# result 提供一个输出

# add_ 直接修改变量

# 对于 Tensor 的访问

# 索引操作

# torch.view ()：对 Tensor 的尺寸修改

# .item()

# 和 Numpy 数组的转换

# Tensor 转换为 Numpy 数组

# Numpy 数组转换为 Tensor

# CUDA 张量

# autograd

# 张量

# Function

# 梯度

# 神经网络

# 定义网络

# 损失函数

# 反向传播

# 更新权重

# 训练分类器

# 训练数据

# 训练图片分类器

# 加载和归一化 CIFAR10

# 构建一个卷积神经网络

# 定义损失函数和优化器

# 训练网络

# 测试模型性能

# 在 GPU 上训练

# 数据并行

# 导入和参数

# 构建一个假数据集

# 简单的模型

# 创建模型和数据平行

# 运行模型

# 运行结果

# 2 GPUs

# 3 GPUs

# 8 GPUs

# 总结

# 小结

# 项目练习：手写数字识别练习 MNIST

PyTorch学习第一章——简介与安装

学习d2l深度学习day1——介绍与数据处理

# 第二章： `PyTorch` 之 `60min` 入门

# 什么是 `PyTorch` ?