Python:通过深度学习开发图像分类器 (八十四)

开发 AI 应用

未来,AI 算法在日常生活中的应用将越来越广泛。例如,你可能想要在智能手机应用中包含图像分类器。为此,在整个应用架构中,你将使用一个用成百上千个图像训练过的深度学习模型。未来的软件开发很大一部分将是使用这些模型作为应用的常用部分。

在此项目中,你将训练一个图像分类器来识别不同的花卉品种。可以想象有这么一款手机应用,当你对着花卉拍摄时,它能够告诉你这朵花的名称。在实际操作中,你会训练此分类器,然后导出它以用在你的应用中。我们将使用此数据集,其中包含 102 个花卉类别。你可以在下面查看几个示例。

file

该项目分为多个步骤:

  • 加载和预处理图像数据集
  • 用数据集训练图像分类器
  • 使用训练的分类器预测图像内容

我们将指导你完成每一步,你将用 Python 实现这些步骤。

完成此项目后,你将拥有一个可以用任何带标签图像的数据集进行训练的应用。你的网络将学习花卉,并成为一个命令行应用。但是,你对新技能的应用取决于你的想象力和构建数据集的精力。例如,想象有一款应用能够拍摄汽车,告诉你汽车的制造商和型号,然后查询关于该汽车的信息。构建你自己的数据集并开发一款新型应用吧。

首先,导入你所需的软件包。建议在代码开头导入所有软件包。当你创建此 notebook 时,如果发现你需要导入某个软件包,确保在开头导入该软件包。

# Imports here
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt
import numpy as np
import time

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torch.autograd import Variable
from torchvision import datasets, transforms, models
# from matplotlib.ticker import FormatStrFormatter

加载数据

在此项目中,你将使用 torchvision 加载数据(文档)。数据应该和此 notebook 一起包含在内,否则你可以在此处下载数据。数据集分成了三部分:训练集、验证集和测试集。对于训练集,你需要变换数据,例如随机缩放、剪裁和翻转。这样有助于网络泛化,并带来更好的效果。你还需要确保将输入数据的大小调整为 224x224 像素,因为预训练的网络需要这么做。

验证集和测试集用于衡量模型对尚未见过的数据的预测效果。对此步骤,你不需要进行任何缩放或旋转变换,但是需要将图像剪裁到合适的大小。

对于所有三个数据集,你都需要将均值和标准差标准化到网络期望的结果。均值为 [0.485, 0.456, 0.406],标准差为 [0.229, 0.224, 0.225]。这样使得每个颜色通道的值位于 -1 到 1 之间,而不是 0 到 1 之间。

train_dir = './flowers/train'  
valid_dir = './flowers/valid'
test_dir = './flowers/test'

# 路径规则 /flowers/train/1/a_flower.jpg, 1 表示标签label
# TODO: Define your transforms for the training, validation, and testing sets

# 训练集
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406], 
                                                            [0.229, 0.224, 0.225])])
# 测试集
test_transforms = transforms.Compose([transforms.Resize(256),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406], 
                                                           [0.229, 0.224, 0.225])])
# 验证集
validation_transforms = transforms.Compose([transforms.Resize(256),
                                            transforms.CenterCrop(224),
                                            transforms.ToTensor(),
                                            transforms.Normalize([0.485, 0.456, 0.406], 
                                                                 [0.229, 0.224, 0.225])])

# TODO: Load the datasets with ImageFolder
train_data = datasets.ImageFolder(train_dir, transform=train_transforms)
test_data = datasets.ImageFolder(test_dir, transform=test_transforms)
valid_data = datasets.ImageFolder(valid_dir, transform=validation_transforms)

# TODO: Using the image datasets and the trainforms, define the dataloaders
trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=32, shuffle=True)
validloader = torch.utils.data.DataLoader(valid_data, batch_size=32, shuffle=True)
# show image
def imageshow(image, ax=None, title=None, normalize=True):
    """Imshow for Tensor."""
    if ax is None:
        fig, ax = plt.subplots()
    image = image.numpy().transpose((1, 2, 0))
    print("image-", image)

    if normalize:
        mean = np.array([0.485, 0.456, 0.406])
        std = np.array([0.229, 0.224, 0.225])
        image = std * image + mean
        image = np.clip(image, 0, 1)

    ax.imshow(image)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.spines['bottom'].set_visible(False)
    ax.tick_params(axis='both', length=0)
    ax.set_xticklabels('')
    ax.set_yticklabels('')

    return ax
# 打印一个图片看看
images, labels = next(iter(trainloader))
images[0]
tensor([[[-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
         [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
         [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
         ...,
         [-2.1179, -2.1179, -2.1179,  ...,  0.5707,  0.5707,  0.5707],
         [-2.1179, -2.1179, -2.1179,  ...,  0.5707,  0.5364,  0.4508],
         [-2.1179, -2.1179, -2.1179,  ...,  0.5193,  0.5022,  0.4679]],

        [[-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357],
         [-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357],
         [-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357],
         ...,
         [-2.0357, -2.0357, -2.0357,  ...,  1.0980,  1.1331,  1.1506],
         [-2.0357, -2.0357, -2.0357,  ...,  1.2031,  1.1856,  1.0980],
         [-2.0357, -2.0357, -2.0357,  ...,  1.1681,  1.1681,  1.1331]],

        [[-1.8044, -1.8044, -1.8044,  ..., -1.8044, -1.8044, -1.8044],
         [-1.8044, -1.8044, -1.8044,  ..., -1.8044, -1.8044, -1.8044],
         [-1.8044, -1.8044, -1.8044,  ..., -1.8044, -1.8044, -1.8044],
         ...,
         [-1.8044, -1.8044, -1.8044,  ...,  0.1128,  0.1128,  0.1476],
         [-1.8044, -1.8044, -1.8044,  ...,  0.0605,  0.0256, -0.0615],
         [-1.8044, -1.8044, -1.8044,  ...,  0.0082, -0.0267, -0.1138]]])
# 将张量转换为图片
imageshow(images[0])
image- [[[-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  ...
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]]

 [[-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  ...
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]]

 [[-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  ...
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]]

 ...

 [[-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  ...
  [ 0.57068247  1.0980393   0.11276696]
  [ 0.57068247  1.1330533   0.11276696]
  [ 0.57068247  1.1505603   0.14762534]]

 [[-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  ...
  [ 0.57068247  1.2030813   0.06047938]
  [ 0.5364329   1.1855743   0.02562099]
  [ 0.45080918  1.0980393  -0.06152498]]

 [[-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  [-2.117904   -2.0357141  -1.8044444 ]
  ...
  [ 0.5193082   1.1680672   0.0081918 ]
  [ 0.50218344  1.1680672  -0.02666659]
  [ 0.46793392  1.1330533  -0.11381256]]]

file

标签映射

你还需要加载从类别标签到类别名称的映射。你可以在文件 cat_to_name.json 中找到此映射。它是一个 JSON 对象,可以使用 json 模块读取它。这样可以获得一个从整数编码的类别到实际花卉名称的映射字典。

import json

with open('cat_to_name.json', 'r') as f:
    cat_to_name = json.load(f)

print('cat_to_name-', cat_to_name)

打印结果:

    cat_to_name- {'21': 'fire lily', '3': 'canterbury bells', '45': 'bolero deep blue', '1': 'pink primrose', '34': 'mexican aster', '27': 'prince of wales feathers', '7': 'moon orchid', '16': 'globe-flower', '25': 'grape hyacinth', '26': 'corn poppy', '79': 'toad lily', '39': 'siam tulip', '24': 'red ginger', '67': 'spring crocus', '35': 'alpine sea holly', '32': 'garden phlox', '10': 'globe thistle', '6': 'tiger lily', '93': 'ball moss', '33': 'love in the mist', '9': 'monkshood', '102': 'blackberry lily', '14': 'spear thistle', '19': 'balloon flower', '100': 'blanket flower', '13': 'king protea', '49': 'oxeye daisy', '15': 'yellow iris', '61': 'cautleya spicata', '31': 'carnation', '64': 'silverbush', '68': 'bearded iris', '63': 'black-eyed susan', '69': 'windflower', '62': 'japanese anemone', '20': 'giant white arum lily', '38': 'great masterwort', '4': 'sweet pea', '86': 'tree mallow', '101': 'trumpet creeper', '42': 'daffodil', '22': 'pincushion flower', '2': 'hard-leaved pocket orchid', '54': 'sunflower', '66': 'osteospermum', '70': 'tree poppy', '85': 'desert-rose', '99': 'bromelia', '87': 'magnolia', '5': 'english marigold', '92': 'bee balm', '28': 'stemless gentian', '97': 'mallow', '57': 'gaura', '40': 'lenten rose', '47': 'marigold', '59': 'orange dahlia', '48': 'buttercup', '55': 'pelargonium', '36': 'ruby-lipped cattleya', '91': 'hippeastrum', '29': 'artichoke', '71': 'gazania', '90': 'canna lily', '18': 'peruvian lily', '98': 'mexican petunia', '8': 'bird of paradise', '30': 'sweet william', '17': 'purple coneflower', '52': 'wild pansy', '84': 'columbine', '12': "colt's foot", '11': 'snapdragon', '96': 'camellia', '23': 'fritillary', '50': 'common dandelion', '44': 'poinsettia', '53': 'primula', '72': 'azalea', '65': 'californian poppy', '80': 'anthurium', '76': 'morning glory', '37': 'cape flower', '56': 'bishop of llandaff', '60': 'pink-yellow dahlia', '82': 'clematis', '58': 'geranium', '75': 'thorn apple', '41': 'barbeton daisy', '95': 'bougainvillea', '43': 'sword lily', '83': 'hibiscus', '78': 'lotus lotus', '88': 'cyclamen', '94': 'foxglove', '81': 'frangipani', '74': 'rose', '89': 'watercress', '73': 'water lily', '46': 'wallflower', '77': 'passion flower', '51': 'petunia'}

构建和训练分类器

数据准备好后,就开始构建和训练分类器了。和往常一样,你应该使用 torchvision.models 中的某个预训练模型获取图像特征。使用这些特征构建和训练新的前馈分类器。

这部分将由你来完成。如果你想与他人讨论这部分,欢迎与你的同学讨论!你还可以在论坛上提问或在工作时间内咨询我们的课程经理和助教导师。

  • 加载预训练的网络(如果你需要一个起点,推荐使用 VGG 网络,它简单易用)
  • 使用 ReLU 激活函数和丢弃定义新的未训练前馈网络作为分类器
  • 使用反向传播训练分类器层,并使用预训练的网络获取特征
  • 跟踪验证集的损失和准确率,以确定最佳超参数

我们在下面为你留了一个空的单元格,但是你可以使用多个单元格。建议将问题拆分为更小的部分,并单独运行。检查确保每部分都达到预期效果,然后再完成下个部分。你可能会发现,当你实现每部分时,可能需要回去修改之前的代码,这很正常!

训练时,确保仅更新前馈网络的权重。如果一切构建正确的话,验证准确率应该能够超过 70%。确保尝试不同的超参数(学习速率、分类器中的单元、周期等),寻找最佳模型。保存这些超参数并用作项目下个部分的默认值。

# 打印一下 torch 版本,决定是否使用被废掉的旧语法 Variable
# 从0.4起, Variable 正式合并入Tensor, Variable 本来实现的自动微分功能,Tensor就能支持。
# 读者还是可以使用Variable(tensor), 但是这个操作其实什么都没做。
# 所以,以后的代码建议直接使用Tensor,因为官方文档中已经将Variable设置成过期模块
# 要想使得Tensor使用autograd功能,只需要设置tensor.requries_grad=True
# @doc https://github.com/corwien/pytorch-handbook/blob/master/chapter2/2.1.2-pytorch-basics-autograd.ipynb

torch.__version__

# 因为该torch版本为0.4.0,所以,继续使用Variable
'0.4.0'
# TODO: Build and train your network
# pre-trained networks

# 定义预训练函数
def nn_model(dropout=0.5, hidden_layer1 = 120, lr = 0.001):

    # densenet121,vgg16,alexnet
    model = models.densenet121(pretrained=True)

    for param in model.parameters():
        param.requires_grad = False

    from collections import OrderedDict
    classifier = nn.Sequential(OrderedDict([
            ('dropout',nn.Dropout(dropout)),
            ('inputs', nn.Linear(1024, hidden_layer1)),
            ('relu1', nn.ReLU()),
            ('hidden_layer1', nn.Linear(hidden_layer1, 90)),
            ('relu2',nn.ReLU()),
            ('hidden_layer2',nn.Linear(90,80)),
            ('relu3',nn.ReLU()),
            ('hidden_layer3',nn.Linear(80,102)),
            ('output', nn.LogSoftmax(dim=1))

          ]))

    model.classifier = classifier
    criterion = nn.NLLLoss()
    optimizer = optim.Adam(model.classifier.parameters(), lr)

    # use GPU
    model.cuda() 
    return model, optimizer, criterion
model,optimizer,criterion = nn_model()

print('model:', model)

打印结果:

    /opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/models/densenet.py:212: UserWarning: nn.init.kaiming_normal is now deprecated in favor of nn.init.kaiming_normal_.

    model: DenseNet(
      (features): Sequential(
        (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu0): ReLU(inplace)
        (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
        (denseblock1): _DenseBlock(
          (denselayer1): _DenseLayer(
            (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu1): ReLU(inplace)
            (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu2): ReLU(inplace)
            (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          )
         .
                 .
                 .

        ...
          (denselayer24): _DenseLayer(
            (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu1): ReLU(inplace)
            (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu2): ReLU(inplace)
            (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          )
        )

        ...
          (denselayer16): _DenseLayer(
            (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu1): ReLU(inplace)
            (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu2): ReLU(inplace)
            (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          )
        )
        (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (classifier): Sequential(
        (dropout): Dropout(p=0.5)
        (inputs): Linear(in_features=1024, out_features=120, bias=True)
        (relu1): ReLU()
        (hidden_layer1): Linear(in_features=120, out_features=90, bias=True)
        (relu2): ReLU()
        (hidden_layer2): Linear(in_features=90, out_features=80, bias=True)
        (relu3): ReLU()
        (hidden_layer3): Linear(in_features=80, out_features=102, bias=True)
        (output): LogSoftmax()
      )
    )
# 实际训练
epochs = 5
steps = 0
print_every = 40
loss_show=[]
running_loss = 0

# change to cuda
model.to('cuda')

for e in range(epochs):
    running_loss = 0
    for ii, (inputs, labels) in enumerate(trainloader):
        steps += 1

        # cuda
        inputs,labels = inputs.to('cuda'), labels.to('cuda')

        # 清空梯度
        optimizer.zero_grad()

        # 在深度学习之中logits就是输入到softmax之前的数值,本质上反映的也是概率,softmax只是归一化一下。
        # 1、在网络中进行前向传播来获取 logits
        # 2、使用 logits 来计算损失
        # 3、使用 loss.backward() 在网络中进行后向传播来计算梯度
        # 4、使用优化器执行一个学习步来更新权重

        # 前向反馈,计算网络结构的输出结果
        outputs = model.forward(inputs)

        # 计算损失函数
        loss = criterion(outputs, labels)

        # 反向传播
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

        # 训练修改权重之后,测试打印精度

        if steps % print_every == 0:
            model.eval()
            vlost = 0
            accuracy = 0

            for ii, (inputs2,labels2) in enumerate(validloader):
                optimizer.zero_grad()
                inputs2, labels2 = inputs2.to('cuda:0') , labels2.to('cuda:0')
                model.to('cuda:0')

                # 不使用梯度
                with torch.no_grad():
                    outputs = model.forward(inputs2)
                    vlost = criterion(outputs,labels2)
                    ps = torch.exp(outputs).data

                    # print('ps-', ps)
                    equality = (labels2.data == ps.max(1)[1])
                    accuracy += equality.type_as(torch.FloatTensor()).mean()

            vlost = vlost / len(validloader)
            accuracy = accuracy /len(validloader)

            print("Epoch: {}/{}... ".format(e+1, epochs),
                  "Loss: {:.4f}".format(running_loss/print_every),
                  "Validation Lost {:.4f}".format(vlost),
                   "Accuracy: {:.4f}".format(accuracy))

            running_loss = 0
Epoch: 1/5...  Loss: 3.3673 Validation Lost 0.1046 Accuracy: 0.3467
Epoch: 1/5...  Loss: 2.5125 Validation Lost 0.0875 Accuracy: 0.4618
Epoch: 2/5...  Loss: 0.8436 Validation Lost 0.0690 Accuracy: 0.5634
Epoch: 2/5...  Loss: 1.7666 Validation Lost 0.0645 Accuracy: 0.5709
Epoch: 2/5...  Loss: 1.5933 Validation Lost 0.0454 Accuracy: 0.6480
Epoch: 3/5...  Loss: 1.2031 Validation Lost 0.0447 Accuracy: 0.6518
Epoch: 3/5...  Loss: 1.3355 Validation Lost 0.0398 Accuracy: 0.6934
Epoch: 4/5...  Loss: 0.3291 Validation Lost 0.0461 Accuracy: 0.7021
Epoch: 4/5...  Loss: 1.2077 Validation Lost 0.0622 Accuracy: 0.7507
Epoch: 4/5...  Loss: 1.1233 Validation Lost 0.0171 Accuracy: 0.7304
Epoch: 5/5...  Loss: 0.7462 Validation Lost 0.0453 Accuracy: 0.7427
Epoch: 5/5...  Loss: 1.0136 Validation Lost 0.0414 Accuracy: 0.7682

测试网络

建议使用网络在训练或验证过程中从未见过的测试数据测试训练的网络。这样,可以很好地判断模型预测全新图像的效果。用网络预测测试图像,并测量准确率,就像验证过程一样。如果模型训练良好的话,你应该能够达到大约 70% 的准确率。

# TODO: Do validation on the test set
def check_accuracy_test(testloader):
    correct = 0
    total = 0
    model.to('cuda:0')

    with torch.no_grad():
        for data in testloader:
            images, labels = data
            images,labels = images.to('cuda'),labels.to('cuda')
            outputs = model(images)

            # 返回一个tensor中的最大值
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            print('predicted:', predicted)
            print('labels:', labels)

            correct += (predicted == labels).sum().item()

            print('correct=', correct)

    print('Accuracy of the network on the test images: %d %%' % (100 * correct / total))

check_accuracy_test(testloader)           
predicted: tensor([  69,   38,   81,   83,   77,   95,   74,   90,   90,   43,
          97,   35,   83,   78,   96,   57,   43,   94,   59,   48,
          34,   13,   76,   75,   85,   55,   99,  101,   37,   20,
          48,   77], device='cuda:0')
labels: tensor([  69,   38,   81,   83,   77,   85,   74,   90,   90,   43,
          99,   33,   83,   98,   96,   57,   43,   94,   59,   48,
          34,    0,   36,   75,   69,   55,    0,  101,   98,   20,
          48,   77], device='cuda:0')
correct= 23
predicted: tensor([ 38,  73,  79,  11,  36,  98,  26,  95,  40,  26,  80,   1,
         89,  49,  83,  75,   4,  80,  78,  75,  58,  61,  44,  54,
         77,  87,  87,  83,  65,  80,  70,  48], device='cuda:0')
labels: tensor([ 38,  73,  79,  11,  25,  98,   5,  95,  35,  26,  80,   1,
         89,  27,  83,  35,   4,  80,  78,  75,  58,  64,  44,  54,
         77,  87,  87,  83,  30,  80,  70,  22], device='cuda:0')
correct= 47

predicted: tensor([  26,   17,   55,   73,   89,   96,   74,   61,   56,   41,
          14,   90,   89,   65,   90,   12,   73,   88,   78,   23,
          82,   13,   90,  101,   24,   53,   52,   16,   33,   73,
          76,   89], device='cuda:0')
labels: tensor([  72,   17,   55,   73,   89,   96,   74,   64,   56,   41,
          14,   90,   89,   30,   90,   12,   77,   88,   78,   23,
          82,    9,   92,  101,   24,   53,   52,    6,   33,   73,
          76,   32], device='cuda:0')
correct= 254
predicted: tensor([  90,   78,   99,   43,   63,    6,   44,   32,   24,   41,
          59,    4,   26,  100,   99,   81,   59,   96,   20,    7,
          58,   49,   40,   91,   82,   97,   96,   82,   97,   83,
          89,   21], device='cuda:0')
labels: tensor([ 84,  78,  73,  43,  63,   6,  44,  32,  24,  41,  59,   4,
         26,  99,  98,  81,  59,  96,  20,  24,  58,  49,  40,  36,
         82,  97,  96,  82,  97,  13,  89,  84], device='cuda:0')
correct= 278

predicted: tensor([ 89,  60,  95,  62,  42,  73,  99,  56,  40,  56,  26,  90,
         41,  55,  33,  29,  51,   2,  73,  46,  18,  12,   7,  36,
         93,  77,  49,  84,   7,  95,  45,  43], device='cuda:0')
labels: tensor([ 15,  60,  95,  62,  42,  73,  99,  56,  40,  21,  28,  90,
         41,  55,  33,  29,  51,   2,  73,  38,  18,  12,   7,  86,
         93,  77,  49,  92,   7,  95,  45,  43], device='cuda:0')
correct= 377
...
predicted: tensor([  83,    7,   41,   49,   26,   90,   17,   49,   34,   97,
          49,   83,   43,   36,    7,   43,   64,   73,   81,   77,
           2,   97,   63,   51,    6,   49,  100,   40,   56,   16,
          33,   11], device='cuda:0')
labels: tensor([ 83,   7,  41,  28,  72,  90,  70,  49,  34,  51,  49,  50,
         43,  36,   7,  51,  64,  88,  81,  77,   2,  97,  63,  47,
          6,  86,  76,  93,  56,  16,  33,  11], device='cuda:0')
correct= 618
predicted: tensor([  49,   84,   26,  101,   33,   68,  101,   54,   23,   74,
          88,   90,   63,   61,   94,   55,   75,   29,   89], device='cuda:0')
labels: tensor([  49,   84,   26,  101,   33,   68,   19,   54,   23,    5,
          88,   90,   63,   85,   94,   55,   75,   29,   96], device='cuda:0')
correct= 633
Accuracy of the network on the test images: 77 %

保存检查点

训练好网络后,保存模型,以便稍后加载它并进行预测。你可能还需要保存其他内容,例如从类别到索引的映射,索引是从某个图像数据集中获取的:image_datasets['train'].class_to_idx。你可以将其作为属性附加到模型上,这样稍后推理会更轻松。


# 注意,稍后你需要完全重新构建模型,以便用模型进行推理。
# 确保在检查点中包含你所需的任何信息。如果你想加载模型并继续训练,则需要保存周期数量和优化器状态 `optimizer.state_dict`。
# 你可能需要在下面的下个部分使用训练的模型,因此建议立即保存它。

# TODO: Save the checkpoint 
class_to_idx = train_data.class_to_idx
print(class_to_idx)

model.class_to_idx = class_to_idx
model.cpu

torch.save({'structure':'densenet121',
            'hidden_layer1':120,
            'state_dict':model.state_dict(),
            'class_to_idx':model.class_to_idx},
            'checkpoint.pth'
           )

打印:

    {'1': 0, '10': 1, '100': 2, '101': 3, '102': 4, '11': 5, '12': 6, '13': 7, '14': 8, '15': 9, '16': 10, '17': 11, '18': 12, '19': 13, '2': 14, '20': 15, '21': 16, '22': 17, '23': 18, '24': 19, '25': 20, '26': 21, '27': 22, '28': 23, '29': 24, '3': 25, '30': 26, '31': 27, '32': 28, '33': 29, '34': 30, '35': 31, '36': 32, '37': 33, '38': 34, '39': 35, '4': 36, '40': 37, '41': 38, '42': 39, '43': 40, '44': 41, '45': 42, '46': 43, '47': 44, '48': 45, '49': 46, '5': 47, '50': 48, '51': 49, '52': 50, '53': 51, '54': 52, '55': 53, '56': 54, '57': 55, '58': 56, '59': 57, '6': 58, '60': 59, '61': 60, '62': 61, '63': 62, '64': 63, '65': 64, '66': 65, '67': 66, '68': 67, '69': 68, '7': 69, '70': 70, '71': 71, '72': 72, '73': 73, '74': 74, '75': 75, '76': 76, '77': 77, '78': 78, '79': 79, '8': 80, '80': 81, '81': 82, '82': 83, '83': 84, '84': 85, '85': 86, '86': 87, '87': 88, '88': 89, '89': 90, '9': 91, '90': 92, '91': 93, '92': 94, '93': 95, '94': 96, '95': 97, '96': 98, '97': 99, '98': 100, '99': 101}

加载检查点

此刻,建议写一个可以加载检查点并重新构建模型的函数。这样的话,你可以回到此项目并继续完善它,而不用重新训练网络。

# TODO: Write a function that loads a checkpoint and rebuilds the model
def load_model(path):
    checkpoint = torch.load(path)
    structure = checkpoint['structure'] # 该参数暂时不用
    hidden_layer1 = checkpoint['hidden_layer1']
    model,_,_ = nn_model(0.5,hidden_layer1)
    model.class_to_idx = checkpoint['class_to_idx']

    model.load_state_dict(checkpoint['state_dict'])

# 调用,打印
load_model('checkpoint.pth')
print(model)

打印:


    DenseNet(
      (features): Sequential(
        (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu0): ReLU(inplace)
        (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
        (denseblock1): _DenseBlock(
          (denselayer1): _DenseLayer(
            (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu1): ReLU(inplace)
            (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (relu2): ReLU(inplace)
            (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          )

         ...

        (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (classifier): Sequential(
        (dropout): Dropout(p=0.5)
        (inputs): Linear(in_features=1024, out_features=120, bias=True)
        (relu1): ReLU()
        (hidden_layer1): Linear(in_features=120, out_features=90, bias=True)
        (relu2): ReLU()
        (hidden_layer2): Linear(in_features=90, out_features=80, bias=True)
        (relu3): ReLU()
        (hidden_layer3): Linear(in_features=80, out_features=102, bias=True)
        (output): LogSoftmax()
      )
    )

类别推理

现在,你需要写一个使用训练的网络进行推理的函数。即你将向网络中传入一个图像,并预测图像中的花卉类别。写一个叫做 predict 的函数,该函数会接受图像和模型,然后返回概率在前 $K$ 的类别及其概率。应该如下所示:

'''
probs, classes = predict(image_path, model)
print(probs)
print(classes)
> [ 0.01558163  0.01541934  0.01452626  0.01443549  0.01407339]
> ['70', '3', '45', '62', '55']
'''

首先,你需要处理输入图像,使其可以用于你的网络。

图像处理

你需要使用 PIL 加载图像(文档)。建议写一个函数来处理图像,使图像可以作为模型的输入。该函数应该按照训练的相同方式处理图像。

首先,调整图像大小,使最小的边为 256 像素,并保持宽高比。为此,可以使用 thumbnailresize 方法。然后,你需要从图像的中心裁剪出 224x224 的部分。

图像的颜色通道通常编码为整数 0-255,但是该模型要求值为浮点数 0-1。你需要变换值。使用 Numpy 数组最简单,你可以从 PIL 图像中获取,例如 np_image = np.array(pil_image)

和之前一样,网络要求图像按照特定的方式标准化。均值应标准化为 [0.485, 0.456, 0.406],标准差应标准化为 [0.229, 0.224, 0.225]。你需要用每个颜色通道减去均值,然后除以标准差。

最后,PyTorch 要求颜色通道为第一个维度,但是在 PIL 图像和 Numpy 数组中是第三个维度。你可以使用 ndarray.transpose对维度重新排序。颜色通道必须是第一个维度,并保持另外两个维度的顺序。

from PIL import Image

def process_image(image):
    ''' Scales, crops, and normalizes a PIL image for a PyTorch model,
        returns an Numpy array
    '''
    # 加载图片
    img_pil = Image.open(image)

    print(img_pil)

    adjustments = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    img_tensor = adjustments(img_pil)

    return img_tensor

# TODO: Process a PIL image for use in a PyTorch model
data_dir = 'flowers'
img = (data_dir + '/test' + '/1/' + 'image_06752.jpg')
img = process_image(img)
print("image tensor:", img)
print("shape:", img.shape)
image tensor: tensor([[[-2.0494, -2.0494, -2.0323,  ..., -1.9124, -1.9467, -1.9638],
         [-2.0494, -2.0837, -2.0494,  ..., -1.9638, -1.9809, -1.9980],
         [-2.0323, -2.0665, -2.0323,  ..., -1.9980, -1.9980, -1.9809],
         ...,
         [-1.7583, -1.7583, -1.8439,  ..., -1.9638, -1.9980, -2.0152],
         [-1.7754, -1.7925, -1.9124,  ..., -1.9467, -2.0152, -2.0323],
         [-1.7583, -1.8097, -1.9124,  ..., -1.9638, -1.9980, -2.0152]],
         ...,
         [-1.5630, -1.6155, -1.6856,  ..., -1.6856, -1.6681, -1.6856],
         [-1.5280, -1.6506, -1.7731,  ..., -1.6681, -1.6856, -1.7031],
         [-1.5280, -1.6681, -1.7731,  ..., -1.6681, -1.6681, -1.7031]],

         ...,
         [-1.6127, -1.6476, -1.7173,  ..., -1.7522, -1.7696, -1.7696],
         [-1.6650, -1.7522, -1.8044,  ..., -1.7347, -1.7522, -1.7696],
         [-1.6650, -1.7696, -1.8044,  ..., -1.7347, -1.7347, -1.7696]]])
shape: torch.Size([3, 224, 224])

要检查你的项目,可以使用以下函数来转换 PyTorch 张量并将其显示在 notebook 中。如果 process_image 函数可行,用该函数运行输出应该会返回原始图像(但是剪裁掉的部分除外)。

def imshow(image, ax=None, title=None):
    """Imshow for Tensor."""
    if ax is None:
        fig, ax = plt.subplots()

    # PyTorch tensors assume the color channel is the first dimension
    # but matplotlib assumes is the third dimension
    image = image.numpy().transpose((1, 2, 0))

    # Undo preprocessing
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    image = std * image + mean

    # Image needs to be clipped between 0 and 1 or it looks like noise when displayed
    image = np.clip(image, 0, 1)

    ax.imshow(image)

    return ax
# 将张量转换为图片输出图像
imshow(process_image("flowers/test/1/image_06743.jpg"))

file

类别预测

可以获得格式正确的图像后

要获得前 $K$ 个值,在张量中使用 x.topk(k)。该函数会返回前 k 个概率和对应的类别索引。你需要使用 class_to_idx(希望你将其添加到了模型中)将这些索引转换为实际类别标签,或者从用来加载数据的 ImageFolder进行转换。确保颠倒字典

同样,此方法应该接受图像路径和模型检查点,并返回概率和类别。

'''
probs, classes = predict(image_path, model)
print(probs)
print(classes)
> [ 0.01558163  0.01541934  0.01452626  0.01443549  0.01407339]
> ['70', '3', '45', '62', '55']
'''
model.class_to_idx =train_data.class_to_idx

ctx = model.class_to_idx

def predict(image_path, model, topk=5):
    ''' Predict the class (or classes) of an image using a trained deep learning model.
    '''
    model.to('cuda:0')
    img_torch = process_image(image_path)
    img_torch = img_torch.unsqueeze_(0)
    img_torch = img_torch.float()

    with torch.no_grad():
        output = model.forward(img_torch.cuda())

    probability = F.softmax(output.data,dim=1)

    return probability.topk(topk)

    # TODO: Implement the code to predict the class from an image file
img = (data_dir + '/test' + '/10/' + 'image_07104.jpg')
val1, val2 = predict(img, model)
print(val1)
print(val2)
tensor([[ 0.3979,  0.3101,  0.2026,  0.0344,  0.0162]], device='cuda:0')
tensor([[  8,   1,  31,  94,  29]], device='cuda:0')

检查运行状况

你已经可以使用训练的模型做出预测,现在检查模型的性能如何。即使测试准确率很高,始终有必要检查是否存在明显的错误。使用 matplotlib 将前 5 个类别的概率以及输入图像绘制为条形图,应该如下所示:

file

你可以使用 cat_to_name.json 文件(应该之前已经在 notebook 中加载该文件)将类别整数编码转换为实际花卉名称。要将 PyTorch 张量显示为图像,请使用定义如下的 imshow 函数。

# TODO: Display an image along with the top 5 classes
from matplotlib.ticker import FormatStrFormatter

def check_res():
    plt.rcParams["figure.figsize"] = (10,5)
    plt.subplot(211)

    index = 1
    path = test_dir + '/1/image_06743.jpg'

    probabilities = predict(path, model)
    image = process_image(path)
    probabilities = probabilities

    axs = imshow(image, ax = plt)
    axs.axis('off')
    axs.title(cat_to_name[str(index)])
    axs.show()

    a = np.array(probabilities[0][0])
    b = [cat_to_name[str(index + 1)] for index in np.array(probabilities[1][0])]

    N=float(len(b))
    fig,ax = plt.subplots(figsize=(8,3))
    width = 0.8
    tickLocations = np.arange(N)
    ax.bar(tickLocations, a, width, linewidth=4.0, align = 'center')
    ax.set_xticks(ticks = tickLocations)
    ax.set_xticklabels(b)
    ax.set_xlim(min(tickLocations)-0.6,max(tickLocations)+0.6)
    ax.set_yticks([0.2,0.4,0.6,0.8,1,1.2])
    ax.set_ylim((0,1))
    ax.yaxis.grid(True)
    ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))

    plt.show()
check_res()

file

file

为者常成,行者常至