在Windows上的Caffe实战：fine-tune猫狗大战

作者：钱彩红

文档目的

上一篇文章介绍了在Windows上如何利用Caffe对自己的图片进行训练，在实际训练中我们常常会发现由于样本数量太少，导致训练结果差，而且中间可能会消耗大量的时间去进行调参但效果却仍不尽如人意。本文仍以猫狗大战为例，介绍如何使用前人训练好的网络和模型，在自己的数据集上进行fine-tuning（微调），以达到快速取得较好的训练结果的目的。

环境介绍

本文所述的工具和命令适用Windows+BVLC Caffe的CPU或GPU版本 (需要提前在机器上安装BVLC Caffe并成功编译)；以及Windows+ clCaffe的版本，但clCaffe是基于Intel Skylake及以后的处理器核显做硬件加速的修改版，使用时要注意。

准备的资料

准备自己的图片数据。这里仍然使用kaggle的dogsvscats（猫狗大战）的图片，下载地址：https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/data。
下载预训练好的模型文件包

BVLC提供的Model Zoo里有很多训练好的经典模型，这里我们选择使用imagenet的一个1000分类模型，这是caffe团队用imagenet图片进行训练，迭代30多万次，训练出来的一个model，这个model将图片分为1000类。模型的下载地址：http://dl.caffe.berkeleyvision.org/bvlc_reference_caffenet.caffemodel

将最后下载到的内容都解压放在自己的目录下的\bvlc子目录下：

可以看到里面的内容非常全面，有solver文件，deploy文件，caffemodel文件以及其他一些需要的文件，这都是我们接下来需要用到的。

数据集预处理

在准备好了自己的图片数据集，并且下载了预训练的模型相关文件后，我们就可以开始fine- tune了。首先需要将自己的图片数据分成train和val的数据集，并且转换成LMDB格式，并生成均值文件。因为我们fine-tuning 需要基于我们自己数据的LMDB和均值文件来进行。关于数据处理的详细步骤仍请参考这篇文章。

这是我们处理后得到的LMDB和均值文件：

修改预训练模型参数

接下来就是将下载下来imagenet的模型参数文件进行修改。

修改solver文件(\bvlc\solver.prototxt)

修改网络文件的路径和文件名
net:后面改为实际使用的网络文件的路径和名称
修改test_iter
原来的test_iter为1000，因为我们目前的测试数据比较少，总共只有5000个图片，batch_size是50，所以我们将这个值改为100
将base_lr从0.01降为0.001。微调时的base_lr不要太大
max_iter也改小一点，因为我们没有那么多数据，我们先改为10000看看效果
stepsize改小一点，我们改为5000。一方面是max_iter现在是10000，stepsize比它大就没有意义了。另一方面是因为我们在实际训练中希望学习率下降的快一点，所以在达到stepsize次iteration达到后learning rate可以变得更小
display改成100，每100次迭代打印一次
snapshot_prefix改成我们需要的路径和名字
其他参数不变

以下是修改前后的solver文件对比

修改前：

#imagenet
net: "models/bvlc_reference_caffenet/train_val.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 450000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "models/bvlc_reference_caffenet/caffenet_train"
solver_mode: GPU

修改后：

#dogsvscats
net: "train_val.prototxt"
test_iter: 100
test_interval: 500
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 5000
display: 100
max_iter: 10000
momentum: 0.9
weight_decay: 0.0005
snapshot: 5000
snapshot_prefix: "models/finetune"
solver_mode: GPU

修改网络文件train_val.prototxt

修改data层的data source和mean file信息，改成我们自己的LMDB和meanfile的路径和文件名
Crop_size 改为208。因为我们的数据在前面生成meanfile时已经被resize则为208*208，所以不改的话会因为data size不匹配而报错，如下图。
最后一层的输出分类num_output从1000改为2. 因为例子中是1000个分类的问题，而我们这个是2分类的问题。在Fine-tune自己的数据时，这一层通常是需要修改的。
修改最后一个全连接层的层名，改为”fc8-dogcat”，这样的话会因为已训练好的模型中没有这个层的层名，就会以新的随机值初始化这一层，这样也就达到了我们适应新任务的目的。注意这里只要用到最后一层名字的地方都要修改，可以批量替换一下。
加快最后一层的学习速率，这样做的目的是让新修改的这一层用新的data重新学习，因此需要更快的学习速率，因此我们将，weight和bias的学习速率加快10倍。

这是修改前后的train_val.prototxt对比：

修改前：

#imagenet
name: "CaffeNet"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
# mean pixel / channel-wise mean instead of mean image
#  transform_param {
#    crop_size: 227
#    mean_value: 104
#    mean_value: 117
#    mean_value: 123
#    mirror: true
#  }
  data_param {
    source: "examples/imagenet/ilsvrc12_train_lmdb"
    batch_size: 256
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
# mean pixel / channel-wise mean instead of mean image
#  transform_param {
#    crop_size: 227
#    mean_value: 104
#    mean_value: 117
#    mean_value: 123
#    mirror: false
#  }
  data_param {
    source: "examples/imagenet/ilsvrc12_val_lmdb"
    batch_size: 50
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc8"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "label"
  top: "loss"
}

修改后：

#dogsvscats
name: "DogCatNet"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 208
    mean_file: "dogsvscats.mean.binaryproto"
  }
# mean pixel / channel-wise mean instead of mean image
#  transform_param {
#    crop_size: 227
#    mean_value: 104
#    mean_value: 117
#    mean_value: 123
#    mirror: true
#  }
  data_param {
    source: "train_imgSet.lmdb"
    batch_size: 128
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 208
    mean_file: "dogsvscats.mean.binaryproto"
  }
# mean pixel / channel-wise mean instead of mean image
#  transform_param {
#    crop_size: 227
#    mean_value: 104
#    mean_value: 117
#    mean_value: 123
#    mirror: false
#  }
  data_param {
    source: "val_imgSet.lmdb"
    batch_size: 50
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8-dogcat"		#Change this layer name
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8-dogcat"
  param {
    lr_mult: 10
    decay_mult: 1
  }
  param {
    lr_mult: 20
    decay_mult: 0
  }
  inner_product_param {
    num_output: 2		#change to 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc8-dogcat"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8-dogcat"
  bottom: "label"
  top: "loss"
}

运行命令开始训练

最后执行训练命令：

C:\Projects\caffe\build\tools\Release\caffe.exe train --solver=solver.prototxt --weights bvlc/bvlc_reference_caffenet.caffemodel

最后发现在2000次迭代时准确率已经达到了0.9658了。

可见，通过这种方式可以快速取得较好的结果。这样的话，本例中max_iter甚至可以设的再小一点就可以取得不错的训练结果。

做完以上步骤后，再对比一下直接训练的命令：

C:\Projects\caffe\build\tools\Release\caffe.exe train --solver=solver.prototxt

就会发现用预训练模型fine-tune的过程和直接训练的过程其实是很类似的。区别只是初始化的时候命令里是否带了weights参数。
a. 不带参数直接训练的话是按照网络定义指定的方式初始化（如constant，gaussian）
b. 已有模型fine-tuning是读取你已经有模型的参数文件来作为初始值

Fine tune的要点和注意事项

运行训练命令时，提供预训练的weights给新的caffe dataset来训练，这样预训练的权重就会载入模型中，并且通过名字来匹配每一层。
Base lr学习率不要设置的太大，因为学习率过大的话原来这个模型里的权重会存在更新过快的问题，这个值一般设定不超过0.001
因为新的任务和原模型中一般是不一样的，如本例中预训练模型是1000分类，而实际任务是二分类的，所以通常需要修改模型中的最后一层，把prototxt最后一层的层名改为一个新的名字。这样这个新名字所在的层将从随机权重开始训练
同理，如果想指定某几层从随机权重开始训练，那么可以修改对应的层为新的名字即可，被修改的层都会从随机权重开始训练。
减少solver prototxt中的总体学习率base_lr，但是增加新引进层的lr_mult。主要原因是想让新数据在新加的层学习很快，而其他的层学习变慢慢
将solver中stepsize设置为比train from scratch（从0开始训练）更低的值，因为我们实际训练可能需要很长一段时间，这样stepsize变小的话后面的学习率减少得快一些。我们也可以通过将lr_mult设置为0来完全防止对最后一层以外的所有层进行微调。
并不是所有的数据集都适合拿来fine-tuning, 新的数据集和预训练数据集特征比较相似（如本例）的情况下会比较适合拿来做fine-tuning

模型的推理实施

训练完成以后，我们会得到自己的.caffemodel, 可以用来进行自己项目的部署、推理和实施。这里特别提一下，如果想在某些低性能、低功耗或可移动设备上运行深度神经网络（DNN – Deep Neural Network），可以使用Intel Movidius神经计算棒（NCS – Neural Computing Stick）来进行增强和实施。在本例中，我们将最后训练出来的模型通过Movidius对给定的一张图片进行预测：