本文所述的工具和命令适用Windows+BVLC Caffe的CPU或GPU版本 (需要提前在机器上安装BVLC Caffe并成功编译); 以及Windows+ clCaffe的版本, 但clCaffe是基于Intel Skylake及以后的处理器核显做硬件加速的修改版,使用时要注意。
- 准备自己的图片数据。这里仍然使用kaggle的dogsvscats(猫狗大战)的图片,下载地址:https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/data。
- 下载预训练好的模型文件包
BVLC提供的Model Zoo里有很多训练好的经典模型,这里我们选择使用imagenet的一个1000分类模型,这是caffe团队用imagenet图片进行训练,迭代30多万次,训练出来的一个model,这个model将图片分为1000类。模型的下载地址:http://dl.caffe.berkeleyvision.org/bvlc_reference_caffenet.caffemodel
在准备好了自己的图片数据集,并且下载了预训练的模型相关文件后,我们就可以开始fine- tune了。首先需要将自己的图片数据分成train和val的数据集,并且转换成LMDB格式,并生成均值文件。因为我们fine-tuning 需要基于我们自己数据的LMDB和均值文件来进行。关于数据处理的详细步骤仍请参考这篇文章。
- 修改网络文件的路径和文件名
net:后面改为实际使用的网络文件的路径和名称 - 修改test_iter
原来的test_iter为1000,因为我们目前的测试数据比较少,总共只有5000个图片,batch_size是50,所以我们将这个值改为100 - 将base_lr从0.01降为0.001。微调时的base_lr不要太大
- max_iter也改小一点,因为我们没有那么多数据,我们先改为10000看看效果
- stepsize改小一点,我们改为5000。一方面是max_iter现在是10000,stepsize比它大就没有意义了。另一方面是因为我们在实际训练中希望学习率下降的快一点,所以在达到stepsize次iteration达到后learning rate可以变得更小
- display改成100,每100次迭代打印一次
- snapshot_prefix改成我们需要的路径和名字
- 其他参数不变
#imagenet net: "models/bvlc_reference_caffenet/train_val.prototxt" test_iter: 1000 test_interval: 1000 base_lr: 0.01 lr_policy: "step" gamma: 0.1 stepsize: 100000 display: 20 max_iter: 450000 momentum: 0.9 weight_decay: 0.0005 snapshot: 10000 snapshot_prefix: "models/bvlc_reference_caffenet/caffenet_train" solver_mode: GPU
#dogsvscats net: "train_val.prototxt" test_iter: 100 test_interval: 500 base_lr: 0.001 lr_policy: "step" gamma: 0.1 stepsize: 5000 display: 100 max_iter: 10000 momentum: 0.9 weight_decay: 0.0005 snapshot: 5000 snapshot_prefix: "models/finetune" solver_mode: GPU
- 修改data层的data source和mean file信息,改成我们自己的LMDB和meanfile的路径和文件名
- Crop_size 改为208。因为我们的数据在前面生成meanfile时已经被resize则为208*208,所以不改的话会因为data size不匹配而报错,如下图。
- 最后一层的输出分类num_output从1000改为2. 因为例子中是1000个分类的问题,而我们这个是2分类的问题。在Fine-tune自己的数据时,这一层通常是需要修改的。
- 修改最后一个全连接层的层名,改为”fc8-dogcat”,这样的话会因为已训练好的模型中没有这个层的层名,就会以新的随机值初始化这一层,这样也就达到了我们适应新任务的目的。注意这里只要用到最后一层名字的地方都要修改,可以批量替换一下。
- 加快最后一层的学习速率,这样做的目的是让新修改的这一层用新的data重新学习,因此需要更快的学习速率,因此我们将,weight和bias的学习速率加快10倍。
#imagenet name: "CaffeNet" layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { mirror: true crop_size: 227 mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" } # mean pixel / channel-wise mean instead of mean image # transform_param { # crop_size: 227 # mean_value: 104 # mean_value: 117 # mean_value: 123 # mirror: true # } data_param { source: "examples/imagenet/ilsvrc12_train_lmdb" batch_size: 256 backend: LMDB } } layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { mirror: false crop_size: 227 mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" } # mean pixel / channel-wise mean instead of mean image # transform_param { # crop_size: 227 # mean_value: 104 # mean_value: 117 # mean_value: 123 # mirror: false # } data_param { source: "examples/imagenet/ilsvrc12_val_lmdb" batch_size: 50 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 11 stride: 4 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "norm1" type: "LRN" bottom: "pool1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "conv2" type: "Convolution" bottom: "norm1" top: "conv2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 2 kernel_size: 5 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "norm2" type: "LRN" bottom: "pool2" top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "conv3" type: "Convolution" bottom: "norm2" top: "conv3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "conv4" type: "Convolution" bottom: "conv3" top: "conv4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "conv5" type: "Convolution" bottom: "conv4" top: "conv5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu5" type: "ReLU" bottom: "conv5" top: "conv5" } layer { name: "pool5" type: "Pooling" bottom: "conv5" top: "pool5" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fc6" type: "InnerProduct" bottom: "pool5" top: "fc6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7" type: "InnerProduct" bottom: "fc6" top: "fc7" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu7" type: "ReLU" bottom: "fc7" top: "fc7" } layer { name: "drop7" type: "Dropout" bottom: "fc7" top: "fc7" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc8" type: "InnerProduct" bottom: "fc7" top: "fc8" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 1000 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "accuracy" type: "Accuracy" bottom: "fc8" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "fc8" bottom: "label" top: "loss" }
#dogsvscats name: "DogCatNet" layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { mirror: true crop_size: 208 mean_file: "dogsvscats.mean.binaryproto" } # mean pixel / channel-wise mean instead of mean image # transform_param { # crop_size: 227 # mean_value: 104 # mean_value: 117 # mean_value: 123 # mirror: true # } data_param { source: "train_imgSet.lmdb" batch_size: 128 backend: LMDB } } layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { mirror: false crop_size: 208 mean_file: "dogsvscats.mean.binaryproto" } # mean pixel / channel-wise mean instead of mean image # transform_param { # crop_size: 227 # mean_value: 104 # mean_value: 117 # mean_value: 123 # mirror: false # } data_param { source: "val_imgSet.lmdb" batch_size: 50 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 11 stride: 4 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "norm1" type: "LRN" bottom: "pool1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "conv2" type: "Convolution" bottom: "norm1" top: "conv2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 2 kernel_size: 5 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "norm2" type: "LRN" bottom: "pool2" top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "conv3" type: "Convolution" bottom: "norm2" top: "conv3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "conv4" type: "Convolution" bottom: "conv3" top: "conv4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "conv5" type: "Convolution" bottom: "conv4" top: "conv5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu5" type: "ReLU" bottom: "conv5" top: "conv5" } layer { name: "pool5" type: "Pooling" bottom: "conv5" top: "pool5" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fc6" type: "InnerProduct" bottom: "pool5" top: "fc6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7" type: "InnerProduct" bottom: "fc6" top: "fc7" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 1 } } } layer { name: "relu7" type: "ReLU" bottom: "fc7" top: "fc7" } layer { name: "drop7" type: "Dropout" bottom: "fc7" top: "fc7" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc8-dogcat" #Change this layer name type: "InnerProduct" bottom: "fc7" top: "fc8-dogcat" param { lr_mult: 10 decay_mult: 1 } param { lr_mult: 20 decay_mult: 0 } inner_product_param { num_output: 2 #change to 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "accuracy" type: "Accuracy" bottom: "fc8-dogcat" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "fc8-dogcat" bottom: "label" top: "loss" }
C:\Projects\caffe\build\tools\Release\caffe.exe train --solver=solver.prototxt --weights bvlc/bvlc_reference_caffenet.caffemodel
C:\Projects\caffe\build\tools\Release\caffe.exe train --solver=solver.prototxt
a. 不带参数直接训练的话是按照网络定义指定的方式初始化(如constant,gaussian)
b. 已有模型fine-tuning是读取你已经有模型的参数文件来作为初始值
Fine tune的要点和注意事项
- 运行训练命令时,提供预训练的weights给新的caffe dataset来训练,这样预训练的权重就会载入模型中,并且通过名字来匹配每一层。
- Base lr学习率不要设置的太大,因为学习率过大的话原来这个模型里的权重会存在更新过快的问题,这个值一般设定不超过0.001
- 因为新的任务和原模型中一般是不一样的,如本例中预训练模型是1000分类,而实际任务是二分类的,所以通常需要修改模型中的最后一层,把prototxt最后一层的层名改为一个新的名字。这样这个新名字所在的层将从随机权重开始训练
- 同理,如果想指定某几层从随机权重开始训练,那么可以修改对应的层为新的名字即可,被修改的层都会从随机权重开始训练。
- 减少solver prototxt中的总体学习率base_lr,但是增加新引进层的lr_mult。主要原因是想让新数据在新加的层学习很快,而其他的层学习变慢慢
- 将solver中stepsize设置为比train from scratch(从0开始训练)更低的值,因为我们实际训练可能需要很长一段时间,这样stepsize变小的话后面的学习率减少得快一些。我们也可以通过将lr_mult设置为0来完全防止对最后一层以外的所有层进行微调。
- 并不是所有的数据集都适合拿来fine-tuning, 新的数据集和预训练数据集特征比较相似(如本例)的情况下会比较适合拿来做fine-tuning
训练完成以后,我们会得到自己的.caffemodel, 可以用来进行自己项目的部署、推理和实施。这里特别提一下,如果想在某些低性能、低功耗或可移动设备上运行深度神经网络(DNN – Deep Neural Network),可以使用Intel Movidius神经计算棒(NCS – Neural Computing Stick)来进行增强和实施。在本例中,我们将最后训练出来的模型通过Movidius对给定的一张图片进行预测:
在Windows上的Caffe实战 – 猫狗大战:https://software.intel.com/zh-cn/articles/the-caffe-practice-on-windows-the-war-between-cat-and-dog
NCS SDK与Caffe的集成:https://software.intel.com/zh-cn/articles/how-to-deploy-tensorflow-and-caffe-for-intel-hardware-platform-into-movidius-ncs-sdk
BVLC关于微调的官方例子和说明: http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
BVLC的model zoo: https://github.com/BVLC/caffe/wiki/Model-Zoo