借助针对英特尔® 架构优化的 Caffe* 来训练和部署深度学习网络

总结

Caffe*是伯克利愿景和学习中心 (BVLC) 开发的深度学习框架。该框架使用 C++ 和 CUDA* C++ 语言编写，并采用 Python* 和 MATLAB* 封装程序，非常适用于卷积神经网络、递归神经网络和多层感知器。主 Caffe 分支有多个不同的分解，支持检测和分类、分割以及兼容的Spark等。

面向英特尔架构优化的 Caffe目前集成了最新版英特尔® 数学核心函数库（英特尔® MKL） 2017，专门面向高级矢量扩展指令集 2（AVX2）和 AVX-512 指令优化，英特尔® 至强® 处理器和英特尔® 至强融核™ 处理器等支持这些指令集。也就是说，面向英特尔® 架构优化的 Caffe 包含 BVLC Caffe 中的所有优势，能够在英特尔架构上高效运行，并且可用于跨不同节点的分布式训练。本教程将介绍如何构建面向英特尔架构优化的 Caffe，使用一个或多个计算节点训练深度网络模型，以及如何部署网络。此外，本文还详细介绍了 Caffe 的各种功能，包括如何优化、提取和查看不同模型的特性，以及如何使用 Caffe Python API。

词汇使用

权重 — 也称作内核、模板或特性提取器
blob — 也称作张量 — 一种N维数据结构，也就是说，一个N-D 张量包含数据、梯度或权重（包括偏差）
单元 — 也称作神经元 — 在数据 blob 上执行非线性转换
特性图 — 也称作渠道
测试 — 也称作推理、分类、得分或部署
模型 — 也称作拓扑或架构

快速熟悉 Caffe 的方法是：

安装
在 MNIST 上训练和测试 LeNet
测试预训练模型，例如 bvlc_googlenet.caffemodel，在某些映像上，例如 cat和fish-bike
在Cats vs Dogs 挑战赛中优化训练模型

请注意，本文内容基于该博客的一部分内容。

安装

以下命令适用于 Ubuntu* 14.04。针对其他 Linux* 或 OS *X 操作系统或 Ubuntu 版本的类似命令，请参见 BVLC Caffe 安装网站. 获得想相关性：（请注意，当您将鼠标放在代码上时，屏幕上会出现三个图标。）点击 "view source"图标查看没有行号的代码。）

sudo apt-get update &&
sudo apt-get -y install build-essential git cmake &&
sudo apt-get -y install libprotobuf-dev libleveldb-dev libsnappy-dev &&
sudo apt-get -y install libopencv-dev libhdf5-serial-dev protobuf-compiler &&
sudo apt-get -y install --no-install-recommends libboost-all-dev &&
sudo apt-get -y install libgflags-dev libgoogle-glog-dev liblmdb-dev &&
sudo apt-get -y install libatlas-base-dev

在 CentOS* 7 上安装相关性，如下所示：

sudo yum -y update &&
sudo yum -y groupinstall "Development Tools"&&
sudo yum -y install wget cmake git &&
sudo yum -y install protobuf-devel protobuf-compiler boost-devel &&
sudo yum -y install snappy-devel opencv-devel atlas-devel &&
sudo yum -y install gflags-devel glog-devel lmdb-devel leveldb-devel hdf5-devel

# The following steps are only required if some packages failed to install
# add EPEL repository then install missing packages
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo rpm -ivh epel-release-latest-7.noarch.rpm
sudo yum -y install gflags-devel glog-devel lmdb-devel leveldb-devel hdf5-devel &&
sudo yum -y install protobuf-devel protobuf-compiler boost-devel

# if packages are still not found--download and install/build the packages, e.g.,
# snappy:
wget http://mirror.centos.org/centos/7/os/x86_64/Packages/snappy-devel-1.1.0-3.el7.x86_64.rpm
sudo yum -y install http://mirror.centos.org/centos/7/os/x86_64/Packages/snappy-devel-1.1.0-3.el7.x86_64.rpm
# atlas:
wget http://mirror.centos.org/centos/7/os/x86_64/Packages/atlas-devel-3.10.1-10.el7.x86_64.rpm
sudo yum -y install http://mirror.centos.org/centos/7/os/x86_64/Packages/atlas-devel-3.10.1-10.el7.x86_64.rpm
# opencv:
wget https://github.com/Itseez/opencv/archive/2.4.13.zip
unzip 2.4.13.zip
cd opencv-2.4.13/
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr/local ..
NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
make all -j $NUM_THREADS
sudo make install -j $NUM_THREADS

# optional (not required for Caffe)
# other useful repositories for CentOS are RepoForge and IUS:
wget http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.3-1.el7.rf.x86_64.rpm
sudo rpm -Uvh rpmforge-release-0.5.3-1.el7.rf.x86_64.rpm
wget https://rhel7.iuscommunity.org/ius-release.rpm
sudo rpm -Uvh ius-release*.rpm

使用相关性的原因（来源）：

boost：一种 C++ 库，用于其数学函数和共享指针
glog，gflags：提供登录和命令行实用程序。对于调试非常重要
leveldb，lmdb：数据库 IO。用于准备你自己的数据
protobuf：用于有效地定义数据结构
BLAS（基本线性代数子程序）：英特尔® 数学内核函数库（英特尔® MKL）、ATLAS*、openBLAS* 等提供的矩阵乘法、矩阵加法等运算。

Caffe 安装指南指出：安装“MKL 有助于提高 CPU 性能。”

如欲获得最佳性能，请使用英特尔® 数学内核函数库（英特尔® MKL）2017，作为Intel® Parallel Studio XE 2017 Beta中的测试版免费提供。英特尔 MKL 2017 正式版（也称作黄金版）将于 2016 年 9 月推出。

或者，用户也可以下载并安装英特尔 MKL 11.3.3（2016 版）。如要下载此版本，请先注册免费的社区许可，然后按照安装说明进行操作。

完成完整后，请按照下面的说明设置正确的环境库（可能需要修改路径）：

echo 'source /opt/intel/bin/compilervars.sh intel64'>> ~/.bashrc
# alternatively edit <mkl_path>/mkl/bin/mklvars.sh replacing INSTALLDIR in
# CPRO_PATH=<INSTALLDIR> with the actual mkl path: CPRO_PATH=<full mkl path>
# echo 'source <mkl path>/mkl/bin/mklvars.sh intel64'>> ~/.bashrc

克隆和准备面向英特尔架构优化的 Caffe，以进行编译：

cd ~
# For BVLC caffe use:
# git clone https://github.com/BVLC/caffe.git
# For intel caffe use:
git clone https://github.com/intel/caffe.git
cd caffe
echo "export CAFFE_ROOT=`pwd`">> ~/.bashrc
source ~/.bashrc
cp Makefile.config.example Makefile.config
# Open Makefile.config and modify it (see comments in the Makefile)
vi Makefile.config

编辑 Makefile.config：

# To run on CPU only and to avoid installing CUDA installers, uncomment
CPU_ONLY := 1

# To use MKL, replace atlas with mkl as follows
# (make sure that the BLAS_DIR and BLAS_LIB paths are correct)
BLAS := mkl
BLAS_DIR := $(MKLROOT)/include
BLAS_LIB := $(MKLROOT)/lib/intel64

# To use MKL2017 DNN primitives as the default engine, uncomment
# (however leave it commented if using multinode training)
# USE_MKL2017_AS_DEFAULT_ENGINE := 1

# To customized compiler choice, uncomment and set the following
# CUSTOM_CXX := g++

# To train on multinode uncomment and verify path
# USE_MPI := 1
# CXX := /usr/bin/mpicxx

如果使用 Ubuntu 16.04，请编辑 Makefile：

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/

并创建符号链接：

cd /usr/lib/x86_64-linux-gnu
sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so
sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so

如果使用 CentOS 7 和 ATLAS（而不是推荐的 MKL 库），请编辑 Makefile：

# Change this line
LIBRARIES += cblas atlas
# to
LIBRARIES += satlas

构建面向英特尔架构优化的 Caffe：

NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
make -j $NUM_THREADS
# To save the output stream to file makestdout.log use this instead
# make -j $NUM_THREADS 2>&1 | tee makestdout.log

或者，也可以使用 cmake：

mkdir build
cd build
cmake -DCPU_ONLY=on -DBLAS-mkl -DUSE_MKL2017_AS_DEFAULT_ENGINE=on /path/to/caffe
NUM_THREADS=$(($(grep 'core id' /proc/cpuinfo | sort -u | wc -l)*2))
make -j $NUM_THREADS

安装 Python 相关性：

# These steps are OPTIONAL but highly recommended to use the Python interface
sudo apt-get -y install gfortran python-dev python-pip
cd ~/caffe/python
for req in $(cat requirements.txt); do sudo pip install $req; done
sudo pip install scikit-image #depends on other packages
sudo ln -s /usr/include/python2.7/ /usr/local/include/python2.7
sudo ln -s /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ \
  /usr/local/include/python2.7/numpy
cd ~/caffe
make pycaffe -j NUM_THREADS
echo "export PYTHONPATH=$CAFFE_ROOT/python">> ~/.bashrc
source ~/.bashrc

其他安装选项：

# These steps are OPTIONAL to test caffe
make test -j $NUM_THREADS
make runtest #"YOU HAVE <some number> DISABLED TESTS" output is OK

# This step is OPTIONAL to disable cam hardware OpenCV driver
# alternatively, the user can skip this and ignore the harmless
# libdc1394 error that may occasionally appears
sudo ln /dev/null /dev/raw1394

数据层

本节讨论了不同的数据类型；即便不了解这些内容，也不影响使用 Caffe。如果您计划使用不同格式的数据，这些内容可能会有用。本节内容基于此篇文章和此篇教程，以及 src/caffe/proto/caffe.proto。

数据通过数据层进入 Caffe，并位于网络底部并在 prototxt 文件中定义。训练部分提供了关于prototxt 文件的更多信息。数据可来自高效的数据库（LevelDB 或 LMDB）、直接来自内存，或者，当不需要太高的效率时，还可以来自磁盘上的文件（采用 HDF5 或通用映像格式）。

指定transform_params（不支持所有数据类型，例如 HDF5）可支持通用输入预处理功能（平均值减法、扩展、随机裁剪和镜像）。如果提前执行所需的数据转换，就不必在数据层中使用该选项。通用数据转换可按照以下方式执行：

  transform_param {
    # randomly horizontally mirror the image
    mirror: 1
    # crop a `crop_size` x `crop_size` patch:
    # - at random during training
    # - from the center during testing
    crop_size: 227
    # substract mean value: these mean_values can equivalently be replaced with a mean.binaryproto file as
    # mean_file: name_of_mean_file.binaryproto
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }

在该示例中，图像被剪裁、镜像并减去平均值。如欲了解其他常用的数据转换，请参看message TransformationParameter下的src/caffe/proto/caffe.proto。

数据

速度极快的内存映射数据库 (LMDB) 和 LevelDB数据库格式可作为输入数据高效处理。他们仅适用于 1-of-k分类。鉴于 Caffe 在读取数据集方面的高效性，我们建议 1-of-k分类采用这些推荐的数据格式。

data_params

必要

source：包含数据库的目录的名称
batch_size：一次可以处理的输入的数量

可选

backend [default LEVELDB]：选择是否使用 LEVELDB 或 LMDB
rand_skip：开始的时候跳过此输入数量。这可以用于异步 sgd

如欲了解其他可用的数据层转换，请参看 message DataParameter下的src/caffe/proto/caffe.proto。

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: 1
    crop_size: 227
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
  data_param {
    source: "examples/imagenet/ilsvrc12_train_lmdb"
    batch_size: 32
    backend: LMDB
  }
}

这是很常见的，但是并不要求层以及层中顶部的 blob 具有相同的名称；也就是说，在每层的 prototxt 文件中，name和top通常是相同的。

或者，也可以减去平均数，具体做法是传递一个平均数映像，并使用一个mean_file:替换所有 mean_value 行。 "data/ilsvrc12/imagenet_mean.binaryproto". 该binaryproto文件可以从 LMDB 数据集创建，如下所示：

cd ~/caffe
build/tools/compute_image_mean examples/imagenet/ilsvr12_train_lmdb
data/ilsvrc12/imagenet_mean.binaryproto

使用相应的 lmdb 文件夹和希望的binaryproto文件来替换examples/imagenet/ilsvr12_train_lmdb和data/ilsvrc12/imagenet_mean.binaryproto。

ImageData

直接从映像文件获得映像和标签。

image_data_params

必要

source：包含数据输入和标签路径的文本文件的名称

可选

batch_size [default 1]：一次可以处理的输入的数量
new_height [default 0]：重新调整高度为该值；如果设置为 0，将会被忽略。
new_width [default 0]：重新调整宽度为该值；如果设置为 0，将会被忽略。
shuffle [default 0]：移动数据；如果设置为 0，将会被忽略。
rand_skip [default 0]：开始的时候跳过该输入数量；可能会用于 async sgd

如欲了解其他常用的映像数据转换，请参看message ImageDataParameter下的src/caffe/proto/caffe.proto。

在该示例中，图像被移动、剪裁、镜像并减去平均值。

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
  image_data_param {
    source: "/path/to/file/train.txt"
    batch_size: 32
    shuffle: 1
  }
}

请注意，文本文件具有映像文件名称和相应的标签。例如，"train.txt"看上去像

/path/to/images/img3423.jpg 2
/path/to/images/img3424.jpg 13
/path/to/images/img3425.jpg 8
...

输入

使用零的 blob 作为输入数据，并指定维度。这通常用于对向前和向后传播进行计时。如欲了解关于网络计时的更多信息，请参考训练章节的结尾。

input_params

必要

shape：用于定义顶部 blob 的一个或多个形状

layer {
  name: "input"
  type: "Input"
  top: "data"
  input_param {
    shape {
      dim: 32
      dim: 3
      dim: 227
      dim: 227
    }
  }
}

或者，该层可被写为：

input: "data"
input_dim: 32
input_dim: 3
input_dim: 227
input_dim: 227

DummyData

除数据类型外，与输入相似。这通常用于调试，但是也可用于对向前和向后传播进行计时。请查看此处的示例。

dummy_data_params

必要

shape：用于定义顶部 blob 的一个或多个形状

可选

data_filler [default ConstantFiller with value of 0]：指定用于顶部 blob 中的值

layer {
  name: "data"
  type: "DummyData"
  top: "data"
  include {
    phase: TRAIN
  }
  dummy_data_param {
    data_filler {
      type: "constant"
      value: 0.01
    }
    shape {
      dim: 32
      dim: 3
      dim: 227
      dim: 227
    }
  }
}
layer {
  name: "data"
  type: "DummyData"
  top: "label"
  include {
    phase: TRAIN
  }
  dummy_data_param {
    data_filler {
      type: "constant"
    }
    shape {
      dim: 32
    }
  }
}

该示例有两个数据层，因为必须指定为每个顶部 blob 提供的数据。请注意，在 Data、ImageData 或 HDF5Data 数据层中，关于标签顶部 blob 的信息位于源文件中。

MemoryData

内存数据层直接从内存读取数据，而不是复制数据。如果要使用，请调用MemoryDataLayer::Reset（从 C++）或 Net.set_input_arrays（从 Python），以便指定连续数据源（作为 4D 行主要阵列），一次读取一个批大小的数据块。

该方法可能速度较慢，因为在使用前需要将数据复制到内存中。但是，一旦复制到内存后，便具备极高的效率。

memory_data_param

必要

batch_size、通道、高度、 宽度：指定从内存中读取的数据块的大小

layers {
  name: "data"
  type: MEMORY_DATA
  top: "data"
  top: "label"
  transform_param {
    crop_size: 227
    mirror: true
    mean_file: "mean.binaryproto"
  }
  memory_data_param {
   batch_size: 32
   channels: 3
   height: 227
   width: 227
  }

HDF5Data

从 HDF5 文件读取任意数据。适用于仅使用 FP32 和 FP64 数据（非 uint8）的任务，因此映像数据非常大。不允许transform_param。尽在需要的时候使用。

hdf5_data_param

必要

source：包含数据输入和标签路径的文本文件的名称
batch_size

可选

shuffle [default false]：移动 HDF5 文件

layer {
  name: "data"
  type: "HDF5_DATA"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  hdf5_data_param {
    source: "examples/hdf5_classification/data/train.txt"
    batch_size: 32
  }
}

HDF5DataOutput

HDF5 输出层在本部分中执行其它层的反向函数，并将输入 blob 写入磁盘。

hdf5_output_param

必要

file_name

layer {
  name: "data_output"
  type: "HDF5_OUTPUT"
  bottom: "data"
  bottom: "label"
  include {
    phase: TRAIN
  }
  hdf5_output_param {
    file_name: "output_file.h5"
  }
}

WindowData

专门用于检测。从映像文件类标签读取窗口。

window_data_param

必要

source：指定数据源
mean_file
batch_size

可选

镜像
crop_size：随机裁剪图像。
crop_mode [default "warp"]：裁剪检测窗口的模式；例如，"warp"改变固定大小；"square"围绕窗口裁剪最紧密的正方形
fg_threshold [default 0.5]：前景（对象）重叠阈值
bg_threshold [default 0.5]：背景（对象）重叠阈值
fg_fraction [default 0.25]：应作为前景对象的批次的部分
context_pad [default 10]：围绕窗口的环境填充物的数量

如欲了解其他窗口数据转换，请参看message WindowDataParameter下的src/caffe/proto/caffe.proto。

layers {
  name: "data"
  type: "WINDOW_DATA"
  top: "data"
  top: "label"
  window_data_param {
    source: "/path/to/file/window_train.txt"

    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
    batch_size: 128
    mirror: true
    crop_size: 227
    fg_threshold: 0.5
    bg_threshold: 0.5
    fg_fraction: 0.25
    context_pad: 16
  }
}

数据集准备

推荐的1-of-k类的数据格式为 LMDB。请按照下面的说明操作，使用 Caffe 工具从 exthe 创建 LMDB：

带有数据的文件夹
必须不存在输出文件夹，例如 mydataset_train_lmdb,
具有映像文件名称和相应标签的文本文件，例如 "train.txt"看上去像

img3423.jpg 2
img3424.jpg 13
img3425.jpg 8
...

请注意，如果该数据分散在不同的文件夹中，train.txt 便可包含该数据点的完整路径。

create_label_file.py是一个创建训练和验证文本文件的简单脚本，用于与Kaggle's Dog vs Cats竞争，并且可以轻松地用于其他任务。

请注意，在测试中，我们假设标签丢失。如果标签可用，便可使用这些相同的步骤来准备 LMDB 测试数据集。

准备带有三个通道的数据（例如，RGB 图像）

下面的示例（基于此）可生成训练 LMDB 并且需要 train.txt。它可以从$CAFFE_ROOT目录运行。

#!/usr/bin/env sh
# folder containing the training and validation images
TRAIN_DATA_ROOT=/path/to/training/images

# folder containing the file with the name of training images
DATA=/path/to/file
# folder for the lmdb datasets
OUTPUT=/path/to/output/directory
TOOLS=/path/to/caffe/build/tools

# Set to resize the images to 256x256
RESIZE_HEIGHT=256
RESIZE_WIDTH=256
echo "Creating train lmdb..."

# Delete the shuffle line if shuffle is not desired
GLOG_logtostderr=1 $TOOLS/convert_imageset
    --resize_height=$RESIZE_HEIGHT
    --resize_width=$RESIZE_WIDTH
    --shuffle
    $TRAIN_DATA_ROOT/
    $DATA/train.txt
    $OUTPUT/mydataset_train_lmdb
echo "Done."

Computing the mean of the images in an LMDB dataset:

#!/usr/bin/env sh
# Compute the mean image in lmdb dataset
OUTPUT=/path/to/output/directory

 # folder for the lmdb datasets and output for mean image
TOOLS=/path/to/caffe/build/tools

$TOOLS/compute_image_mean $OUTPUT/mydataset_train_lmdb
  $OUTPUT/train_mean.binaryproto

$TOOLS/compute_image_mean $OUTPUT/mydataset_val_lmdb
  $OUTPUT/val_mean.binaryproto

准备带有不同通道的数据

灰度图像（一个通道）、RADAR 图像（两个通道）、视频（四个通道）、图像+景深（四个通道）、振动测量（一个通道）以及频谱图（一个通道）需要一个封装程序，以便设置 LMDB 数据集（请参考该博客脚本作为指南）。

调整图像大小

有两个常用的方法可以调整图像大小：

将图像调整为所需的尺寸
按比例调整大小，先确定小一点的尺寸，然后居中裁剪较大的尺寸，直到达到所需的尺寸

用户可以通过多种方式调整大小：

通过 OpenCV*，在创建 LMDB 文件夹时，例如build/tools/convert_imageset --resize_height=256 --resize_width=256将图像调整为期望的尺寸；convert_imageset调用ReadImageToDatum，后者调用 caffe/src/util/io.cpp中的ReadImageToCVMat
通过 ImageMagick，例如convert -resize 256x256\! <input_img> <output_img>将图像调整为期望的尺寸
通过 OpenCV，使用支持在tools/extra/resize_and_crop_images.py中进行多线程图像转换的脚本，按比例调整大小，然后居中裁剪。这需要：

sudo pip install git+https://github.com/Yangqing/mincepie.git
sudo apt-get install -y python-opencv
vi tools/extra/launch_resize_and_crop_images.sh # set number of clients (use num_of_cores*2); file.txt, input, and output folders

此外，作为数据层的一部分，可以对图像进行裁剪或调整大小：

layer {
  name: "data"
  transform_param {
    crop_size: 227
...
}

裁剪图像（训练期间为随机图像；测试期间为中央图像），而且

layer {
  name: "data"
  image_data_param {
    new_height: 227
    new_width: 227
...
}

使用 OpenCV 将图像调整为 new_height或 new_width。

训练

训练要求：

train_val.prototxt：定义网络架构、初始化参数和本地学习速度
solver.prototxt：定义优化/训练参数，并作为调用的实际文件来训练深度网络
deploy.prototxt：仅在测试中使用。必须与train_val.prototxt完全相同，输入层、损失层和权重初始化（例如weight_filler）除外，因为后两个在deploy.prototxt中不存在。

这是很常见的，但是并不要求层以及层中的 blob 具有相同的名称。在每层的 prototxt 文件中， name和top通常是相同的。

关于每层的具体描述，请参看这里。初始化参数极为重要。这些参数在这里设置。一些其他有用的技巧：

weight_filter 初始化（对于ReLU单元来说，MSRAFiller通常优于xavier，而xavier通常优于gaussian；请注意，对于MSRAFiller和xavier，无需手动指定std）
gaussian：来自 Gaussian 分布N(0,std)的示例权重
xavier：来自均匀分布U(-a,a)的示例权重，其中a=sqrt(3/fan_in)，fan_in代表输入的数量
MSRAFiller：来自正常分布N(0,a)的示例权重，其中a=sqrt(2/fan_in)
base_lr：初始学习速度（默认为.01，如果在训练中发生 NAN 损失，将会变为更小的数值）
lr_mult：偏差通常设置为两倍，lr_mult无偏差权重

LeNet 示例lenet_train_test.prototxt、deploy.prototxt和solver.prototxt在下面予以介绍（包括关于每个参数含义的评价）：

solver.prototxt

# The train/validation net protocol buffer definition, that is, the training architecture
net: "examples/mnist/lenet_train_test.prototxt"

# Note: 1 iteration = 1 forward pass over all the images in one batch

# Carry out a validation test every 500 training iterations.
test_interval: 500

# test_iter specifies how many forward passes the validation test should carry out
#  a good number is num_val_imgs / batch_size (see batch_size in Data layer in phase TEST in train_test.prototxt)
test_iter: 100

# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005

# We want to initially move fast towards the local minimum and as we approach it, we want to move slower
# To this end, there are various learning rates policies available:
#  fixed: always return base_lr.
#  step: return base_lr * gamma ^ (floor(iter / step))
#  exp: return base_lr * gamma ^ iter
#  inv: return base_lr * (1 + gamma * iter) ^ (- power)
#  multistep: similar to step but it allows non uniform steps defined by stepvalue
#  poly: the effective learning rate follows a polynomial decay, to be zero by the max_iter: return base_lr (1 - iter/max_iter) ^ (power)
#  sigmoid: the effective learning rate follows a sigmod decay: return base_lr * ( 1/(1 + exp(-gamma * (iter - stepsize))))
lr_policy: "step"
gamma: 0.1
stepsize: 10000 # Drop the learning rate in steps by a factor of gamma every stepsize iterations

# Display every 100 iterations
display: 100

# The maximum number of iterations
max_iter: 10000

# snapshot intermediate results, that is, every 5000 iterations it saves a snapshot of the weights
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet_multistep"

# solver mode: CPU or GPU
solver_mode: CPU

训练网络：

# The name of the output file (aka the trained weights) is in solver.prototxt
$CAFFE_ROOT/build/tools/caffe train -solver solver.prototxt

训练将提供两种类型的文件（注意10000是完成的迭代的数量）：

lenet_multistep_10000.caffemodel：测试中使用的架构的权重
lenet_multistep_10000.solverstate：训练停止（例如断电）时使用，以便从当前的迭代恢复训练

要训练网络并确定相对于迭代的验证准确性或损失：

#CHART_TYPE=[0-7]
#  0: Test accuracy  vs. Iters
#  1: Test accuracy  vs. Seconds
#  2: Test loss  vs. Iters
#  3: Test loss  vs. Seconds
#  4: Train learning rate  vs. Iters
#  5: Train learning rate  vs. Seconds
#  6: Train loss  vs. Iters
#  7: Train loss  vs. Seconds
CHART_TYPE=0
$CAFFE_ROOT/build/tools/caffe train -solver solver.prototxt 2>&1 | tee logfile.log
python $CAFFE_ROOT/tools/extra/plot_training_log.py.example $CHART_TYPE name_of_plot.png logfile.log

中途退出(Dropout)可与完全连接的层结合使用。它只是在每次正向传递时减少不同权重的百分比，以便减少过适现象，从而防止权重之间的相互适应。在测试中忽略。

layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}

测量向前和向后传播时间（并非权重更新）：

# Computes 50 iterations and returns forward, backward, and total time and the average
# note that the training samples and mean.binaryproto may be required or
# alternatively, use dummy variables
NUMITER=50
/path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER

为确保计时准确，可使用 Linux 实用程序 numactl 在 MCDRAM 中分配内存缓冲区：

numactl -i all /path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER

Model Zoo

Caffe Model Zoo收集了多种经过训练的深度学习模式和/或 prototxt 文件，可用于各种不同的任务。这些模式可用于调试或测试目的。

多节点分布式训练

本节中的内容基于英特尔的 Caffe Github wiki。有两种主要的方法可用于在多个节点之间分配训练：模式并行化和数据并行化。在模式并行化中，模式在多个节点之间划分，每个节点都具有完整的数据批次。在数据并行化中，数据批次在多个节点之间划分，每个节点都具有完整的模式。当模式中的权重数较小以及数据批次较大时，数据并行化特别有用。在某些情况下还可以使用混合模式与数据并行化，比如当使用数据并行化方法来训练具有较少权重的层（例如卷积层）时，以及使用模式并行化方法来训练具有较多权重的层（例如完全连接的层）时。英特尔发布了理论分析，以便在这种混合方法中实现数据与模式并行化的最优分配。

鉴于带有更少权重的深入网络（比如 GoogleNet 和 ResNetand）的普及，以及使用数据并行化的分布式训练的成功，针对英特尔架构优化的 Caffe 支持数据并行化。多节点分布式训练目前正处于快速开发阶段，我们将对新的特性进行评估。

要训练不同的节点，请确保这两行位于Makefile.config中

USE_MPI := 1
# update with the path to binary MPI library
CXX := /usr/bin/mpicxx

使用多节点非常简单：

mpirun --hostfile path/to/hostfile -n <num_processes> /path/to/caffe/build/tools/caffe train --solver=/path/to/solver.prototxt --param_server=mpi

其中，<num_processes>代表要使用的节点的数量，hostfile包含每行节点的 ip 地址。请注意，solver.prototxt指向每个节点中的train.prototxt，每个train.prototxt需要指向不同的数据集部分。如欲了解更多详细信息，请点击这里。

优化

循环利用层定义 prototxt 文件并做出两个变化。

1. 更改数据层以包括新的数据（请注意，比例为1/255）：

layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "newdata_lmdb" # CHANGED THIS LINE TO THE NEW DATASET
    batch_size: 64
    backend: LMDB
  }
}

2. 更改最后一层，本例为ip2（在测试中，对deploy.prototxt文件进行相同的修改）：

layer {
  name: "ip2-ft" # CHANGED THIS LINE
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2-ft" # CHANGED THIS LINE
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 2 #CHANGED THIS LINE TO THE NUMBER OF CLASSES IN NEW DATASET
    bias_filler {
      type: "constant"
    }
  }
}

Invoke Caffe:

#From the command line on $CAFFE_ROOT
./build/tools/caffe train -solver /path/to/solver.prototxt -weights  /path/to/trained_model.caffemodel

优化指南

首先学习最后一层（早期层权重不会在优化方面有什么变化）
将初始学习速度（在solver.prototxt中）降低10 倍或 100 倍
Caffe 层支持本地学习速度： lr_mult
冻结除最后一层外的所有层（也可能包括倒数第二层），以便进行快速优化，即本地学习速度中的lr_mult=0
将最后一层和倒数第二层的本地学习速递提高10倍和 5倍
够用时便可停止，或者继续优化其他层

系统内部发生了什么：

创建一个新的网络
复制之前的权重以便初始化网络权重
用通常的方法解决（查看示例）

测试

测试通常是指在 Python 中，或使用随 Caffe 提供的本机 C++ 实用程序进行推理、分类或评分。如果要对一张或一组图像（或信号）分类，则需要：

图像
网络架构
网络权重

使用本机 C++ 实用程序进行测试的灵活性较低，推荐使用Python 。具有模型的 protoxt 文件应在数据层（包含测试数据集）具备phase: TEST，这样才能对模型进行测试。

/path/to/caffe/build/tools/caffe test -model /path/to/train_val.prototxt
- weights /path/to/trained_model.caffemodel -iterations <num_iter>

上面的示例摘自这篇博客。要使用预先训练的模型来分类图像，请先下载预先训练的模型：

./scripts/download_model_binary.py models/bvlc_reference_caffenet

接下来，下载数据集（本例中为ILSVRC 2012）标签（也称作synset文件），因为需要将预测映射到类名称：

./data/ilsvrc12/get_ilsvrc_aux.sh

然后，对图像进行分类：

./build/examples/cpp_classification/classification.bin
  models/bvlc_reference_caffenet/deploy.prototxt
  models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
  data/ilsvrc12/imagenet_mean.binaryproto
  data/ilsvrc12/synset_words.txt
  examples/images/cat.jpg

输出结果如下所示：

---------- Prediction for examples/images/cat.jpg ----------
0.3134 - "n02123045 tabby, tabby cat"
0.2380 - "n02123159 tiger cat"
0.1235 - "n02124075 Egyptian cat"
0.1003 - "n02119022 red fox, Vulpes vulpes"
0.0715 - "n02127052 lynx, catamount"

特性提取器和可视化

在卷积层中，可通过 blob 来代表一个层到下一个层的权重： output_feature_maps x 高度 x 宽度 x input_feature_maps（feature_maps也称作渠道）。可以通过两种方法将 Caffe 中训练的网络用于特性提取器：第一种方法，也是建议的方法，是使用Python API。第二种方法是使用随 Caffe 提供的本机 C++ 实用程序：

# Download model params
scripts/download_model_binary.py models/bvlc_reference_caffenet

# Generate a list of the files to process
# Use the images that ship with caffe
find `pwd`/examples/images -type f -exec echo {} ; > examples/images/test.txt

# Add a 0 to the end of each line
# input data structures expect labels after each image file name
sed -i "s/$/ 0/" examples/images/test.txt

# Get the mean of trainint set to subtract it from images
./data/ilsvrc12/get_ilsvrc_aux.sh

# Copy and modify the data layer to load and resize the images:
cp examples/feature_extraction/imagenet_val.prototxt examples/images
vi examples/iamges/imagenet_val.prototxt

# Extract features
./build/tools/extract_features.bin models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
  examples/images/imagenet_val.prototxt fc7 examples/images/features 10 lmdb

特性 blob 从上面的fc7提取，后者代表了参考模型的最高级别特性。或者，也可以使用其它层，例如conv5或pool3。上面的最后一个参数10 lmdb是迷你批量尺寸。特性被存储于 LevelDB examples/images/features，可供其它部分代码访问。

使用 Python* API

您无需了解本部分的内容也可以开始使用 Caffe。本部分的内容基于这篇博客。在测试、分类和特性提取中可以轻松地使用 Python 接口，而且该接口也可用于定义和训练网络。

设置 Python Caffe

确保编译 Caffe 时调用make pycaffe。在 Python 中，首先导入 caffe 模块：

# Make sure that caffe is on the python path:
# (alternatively set PYTHONCAFFE var as explained the installation)
import sys
CAFFE_ROOT = '/path/to/caffe/'
sys.path.insert(0, CAFFE_ROOT + 'python')
import caffe
caffe.set_mode_cpu()

加载网络架构

网络架构位于train_val.prototxt或deploy.prototxt文件中。加载网络：

net = caffe.Net('train_val.prototxt', caffe.TRAIN)

或者，如果加载一套特定的权重：

net = caffe.Net('deploy.prototxt', 'trained_model.caffemodel', caffe.TRAIN)

使用caffe.TRAIN的原因是，如果运行两次，caffe.TEST便会崩溃，caffe.TRAIN也会出现相同的结果。

net包含数据 blob (net.blobs) 和参数权重 blob (net.params)。在下面的命令中，可使用任意其它层的名称来替换conv1：

net.blobs['conv1']：conv1层的数据输出，也称为特性图
net.params['conv1'][0]：conv1层的权重 blob
net.params['conv1'][1]：conv1层的偏差 blob
net.blobs.items()：返回所有层的数据 blob - 适用于重复循环层的for环路

网络可视化

要显示网络，请首先安装pydot模块和graphviz

sudo apt-get install -y GraphViz
sudo pip install pydot

运行draw_net.py python 脚本：

python python/draw_net.py examples/net_surgery/deploy.prototxt train_val_net.png
open train_val_net.png

数据输入

使用下列技巧中的一种，将数据输入数据层 blob：

修改数据层，以匹配图像大小：

import numpy as np
# get input image and arrange it as a 4-D tensor
im = np.array(Image.open('/path/to/caffe/examples/images/cat_gray.jpg'))
im = im[np.newaxis, np.newaxis, :, :]
# resize the blob to be the size of the input image
net.blobs['data'].reshape(im.shape) # if the image input is different
# compute the blobs given the input data
net.blobs['data'].data[...] = im

修改输入数据，以匹配数据层预期输入的大小：

im = caffe.io.load.image('/path/to/caffe/examples/images/cat_gray.jpg')
shape = net.blobs['data'].data.shape
# resize the img to be the size of the data blob
im = caffe.io.resize(im, shape[3], shape[2], shape[1])
# compute the blobs given the input data
net.blobs['data'].data[...] = im

通常使用的输入数据转化包括：

net = caffe.Net('deploy.prototxt', 'trained_model.caffemodel', caffe.TRAIN)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
ilsvrc_mean = 'python/caffe/imagenet/ilsvrc_2012_mean.npy'
transformer.set_mean('data', np.load(ilsvrc_mean).mean(1).mean(1))
# puts the channel as the first dimention
transformer.set_transpose('data', (2,0,1))
# (2,1,0) maps RGB to BGR for example
transformer.set_channel_swap('data', (2,1,0))
transformer.set_raw_scale('data', 255.0)
# the batch size can be changed on-the-fly
net.blobs['data'].reshape(1,3,227,227)
# load the image in the data layer
im = caffe.io.load.image('/path/to/caffe/examples/images/cat_gray.jpg')
# transform the image and store it in the net.blob
net.blobs['data'].data[...] = transformer.preprocess('data', im)

要查看im：

import matplotlib.pyplot as plt
plt.imshow(im)

推断

有关输入图像的预测可通过以下方式计算：

# assumes that images are loaded
prediction = net.forward()
print 'predicted class:', prediction['prob'].argmax()

要对前向传播进行计时（忽略数据预处理时间）：

timeit net.forward()

另外一个能够转换数据并且可对不同数据输入进行分类的模块是net.Classifier。也就是说，net.Classifier可用来取代net.Net 和caffe.io.Transformer。

im1 = caffe.io.load.images('/path/to/caffe/examples/images/cat.jpg')
im2 = caffe.io.load.images('/path/to/caffe/examples/images/fish-bike.jpg')
imgs = [im1, im2]
ilsvrc_mean = '/path/to/caffe/python/caffe/imagenet/ilsvrc_2012_mean.npy'
net = caffe.Classifier('deploy.prototxt', 'trained_model.caffemodel',
                       mean=np.load(ilsvrc_mean).mean(1).mean(1),
                       channel_swap=(2,1,0),
                       raw_scale=255,
                       image_dims=(256, 256))
prediction = net.predict(imgs) # predict takes any number of images
print 'predicted classes:', prediction[0].argmax(), prediction[1].argmax()

如果使用具有多个图像的文件夹，请按照下面的方式替换 imgs：

IMAGES_FOLDER = '/path/to/folder/w/images/'
import os
images = os.listdir(IMAGES_FOLDER)
imgs = [ caffe.io.load_image(IMAGES_FOLDER + im) for im in images ]

整个测试集可能不适用于内存。因此，可以成批计算预测，例如 100 张图像的批次。

要在条形图中查看im1的所有类的概率

plt.plot(prediction[0])

要对一张图像（过度取样）的完整分类管道（包括im1转换）进行计时。过度取样裁剪 10 张图像：中心、角落和镜像：

timeit net.predict([im1])

如果过度取样被设置为假值，它仅裁剪中心：

timeit net.predict([im1], oversample=0)

特性提取和可视化

要检查每个特定层的数据，例如fc7:

net.blobs['fc7'].data

要检索网络层和形状的详细信息

# Retrieve details of the network’s layers
[(k, v.data.shape) for k, v in net.blobs.items()]

# Retrieve weights of the network’s layers
[(k, v[0].data.shape) for k, v in net.params.items()]

# Retrieve the features in the last fully connected layer
# prior to outputting class probabilities
feat = net.blobs['fc7'].data[4]

# Retrieve size/dimensions of the array
feat.shape

blob 可视化：

# Assumes that the "net = caffe.Classifier" module has been called
# and data has been formatted as in the example above

# Take an array of shape (n, height, width) or (n, height, width, channels)
# and visualize each (height, width) section in a grid
# of size approx. sqrt(n) by sqrt(n)
def vis_square(data, padsize=1, padval=0):
    # values between 0 and 1
    data -= data.min()
    data /= data.max()

    # force the number of filters to be square
    n = int(np.ceil(np.sqrt(data.shape[0])))
    padding = ((0, n ** 2 - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3)
    data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))

    # tile the filters into an image
    data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
    data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])

    plt.imshow(data)

plt.rcParams['figure.figsize'] = (25.0, 20.0)

# visualize the weights after the 1st conv layer
net.params['conv1'][0].data.shape
filters = net.params['conv1'][0].data
vis_square(filters.transpose(0, 2, 3, 1))

# visualize the feature maps after 1st conv layer
net.blobs['conv1'].data.shape
feat = net.blobs['conv1'].data[0,:96]
vis_square(feat, padval=1)

# visualize the weights after the 2nd conv layer
net.blobs['conv2'].data.shape
feat = net.blobs['conv2'].data[0,:96]
vis_square(feat, padval=1)

# visualize the weights after the 2nd pool layer
net.blobs['pool2'].data.shape
feat = net.blobs['pool2'].data[0,:256] # change 256 to number of pool outputs
vis_square(feat, padval=1)

# Visualize the neuron activations for the 2nd fully-connected layer
net.blobs['ip2'].data.shape
feat = net.blobs['ip2'].data[0]
plt.plot(feat.flat)
plt.legend()
plt.show()

定义网络

网络可以在 Python 中定义并保存至 prototxt 文件：

from caffe import layers as L
from caffe import params as P

def lenet(lmdb, batch_size):
    # auto generated LeNet
    n = caffe.NetSpec()
    n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb, transform_param=dict(scale=1./255), ntop=2)
    n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
    n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
    n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.ip1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
    n.relu1 = L.ReLU(n.ip1, in_place=True)
    n.ip2 = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))
    n.loss = L.SoftmaxWithLoss(n.ip2, n.label)
    return n.to_proto()

with open('examples/mnist/lenet_auto_train.prototxt', 'w') as f:
    f.write(str(lenet('examples/mnist/mnist_train_lmdb', 64)))

with open('examples/mnist/lenet_auto_test.prototxt', 'w') as f:
    f.write(str(lenet('examples/mnist/mnist_test_lmdb', 100)))

上面的代码将生成下面的 prototxt 文件：

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  transform_param {
    scale: 0.00392156862745
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 20
    kernel_size: 5
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  convolution_param {
    num_output: 50
    kernel_size: 5
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

训练网络

在 Python 中加载解析器并执行前向传播：

solver = caffe.get_solver('models/bvlc_reference_caffenet/solver.prototxt')
net = caffe.Net('train_val.prototxt', caffe.TRAIN)
solver.net.forward()  # train net
solver.test_nets[0].forward()  # test net (there can be more than one)

要计算渐变：

solver.net.backward()

渐变值可以按照下面的方式显示：

# data gradients
net.blobs['conv1'].diff

# weight gradients
net.params['conv1'][0].diff

# biases gradients
net.params['conv1'][1].diff

要启动一次迭代、向前传播、向后传播和更新：

solver.step(1)

要启动solver.prototxt中定义的所有迭代（作为max_iter）：

solver.step()

调试

本节为可选内容，仅面向 Caffe 开发人员。

关于调试的若干技巧：

消除随机性
比较 caffemodel
使用 Caffe 的调试信息

消除随机性有助于再现行为和输出结果。从非结合浮点算数运算消除随机性不在本文的讨论范围内。

可以在不同阶段添加随机性：

权重通常是在分配（例如 Gaussian）后随机初始化。
通过随机水平翻转图像或者随机剪裁图像的不同部分（例如从 256x256 图像裁剪 227x227 补丁），可以对图像进行预处理；而且也可以随机地乱序执行图像
在训练中的 dropout 层，可以随机使用某些权重，并忽略其它权重

解决这一问题的方法是使用 seed。在solver.prototxt中添加一行：

# pick some value for random_seed that is greater or equal to 1, for example:
random_seed: 42

这可确保使用相同的“随机”值。但是，seed 可能会在不同的设备上生成不同的值。跨设备工作时更加稳定：

使用相同的乱序执行图像来准备数据，每次尝试的时候不要重复乱序执行
在train.prototxt、ImageData层以及transform_param中：不要裁剪和镜像图像。如果需要更小尺寸的图像，可以对 image_data_param 中的图像进行调整：

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
 #   mirror: true
 #   crop_size: 227
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
  image_data_param {
    source: "/path/to/file/train.txt"
    batch_size: 32
    new_height: 224
    new_width: 224
  }
}

在 dropout 层的train.prototxt中，dropout_ratio: 0。

其它有用的指南

在solver.prototxt中，将lr_policy更改为fixed
在solver.prototxt中添加行debug_info: 1

比较两个 affemodel，下面的脚本返回 caffemodel 中所有权重差值的总和：

# Intel Corporation
# Author: Ravi Panchumarthy

import sys, os, argparse, time
import pdb
import numpy as np

def get_args():
    parser = argparse.ArgumentParser('Compare weights of two caffe models')

    parser.add_argument('-m1', dest='modelFile1', type=str, required=True,
                        help='Caffe model weights file to compare')
    parser.add_argument('-m2', dest='modelFile2', type=str, required=True,
                        help='Caffe model weights file to compare aganist')
    parser.add_argument('-n', dest='netFile', type=str, required=True,
                        help='Network prototxt file associated with model')
    return parser.parse_args()

if __name__ == "__main__":
    import caffe

    args = get_args()
    net = caffe.Net(args.netFile, args.modelFile1, caffe.TRAIN)
    net2compare = caffe.Net(args.netFile, args.modelFile2, caffe.TRAIN)

    wt_sumOfAbsDiffByName = dict()
    bias_sumOfAbsDiffByName = dict()

    for name, blobs in net.params.iteritems():
        wt_diffTensor = np.subtract(net.params[name][0].data, net2compare.params[name][0].data)
        wt_absDiffTensor = np.absolute(wt_diffTensor)
        wt_sumOfAbsDiff = wt_absDiffTensor.sum()
        wt_sumOfAbsDiffByName.update({name : wt_sumOfAbsDiff})

        # if args.layerDebug == 1:
        #     print("%s : %s" % (name,wt_sumOfAbsDiff))

        bias_diffTensor = np.subtract(net.params[name][1].data, net2compare.params[name][1].data)
        bias_absDiffTensor = np.absolute(bias_diffTensor)
        bias_sumOfAbsDiff = bias_absDiffTensor.sum()
        bias_sumOfAbsDiffByName.update({name : bias_sumOfAbsDiff})

    print("\nThe sum of absolute difference of all layer's weight is : %s" % sum(wt_sumOfAbsDiffByName.values()))
    print("The sum of absolute difference of all layer's bias is : %s" % sum(bias_sumOfAbsDiffByName.values()))

    finalDiffVal = sum(wt_sumOfAbsDiffByName.values())+ sum(bias_sumOfAbsDiffByName.values())
    print("The sum of absolute difference of all layers weight's and bias's is : %s" % finalDiffVal )

要进行进一步的调至，在Makefile.config中，取消行DEBUG := 1的批注，对代码进行编译，然后执行命令：

gdb /path/to/caffe/build/caffe

一旦gdb开始使用run命令，添加其余的参数

run train -solver /path/to/solver.prototxt

`示例`

`LeNet on MNIST`

本节的目的是展示具体操作的步骤，例如准备数据集、训练模型以及对模型进行计时。本节内容基于这里和这里。

准备数据集：

cd $CAFFE_ROOT
./data/mnist/get_mnist.sh # downloads MNIST dataset
./examples/mnist/create_mnist.sh # creates dataset in LMDB format

训练数据集：

# Reduce the number of iterations from 10K to 1K to quickly run through this example
sed -i 's/max_iter: 10000/max_iter: 1000/g' examples/mnist/lenet_solver.prototxt
./build/tools/caffe train -solver examples/mnist/lenet_solver.prototxt

对向前和向后传播进行计时（不包过权重更新）：

./build/tools/caffe time --model=examples/mnist/lenet_train_test.prototxt -iterations 50 # runs on CPU

为确保计时准确，可使用实用程序 numactl 在 MCDRAM 中分配内存缓冲区：

numactl -i all /path/to/caffe/build/tools/caffe time --model=train_val.prototxt -iterations $NUMITER

测试训练模型：本示例介绍了在验证测试中进行测试。在实际场景中，应使用不同的数据集进行测试，并使用下面的格式或上面介绍的格式。

# the file with the model should have a 'phase: TEST'
./build/tools/caffe test -model examples/mnist/lenet_train_test.prototxt
  -weights examples/mnist/lenet_iter_1000.caffemodel -iterations 50

`Dogs vs Cats`

申请一个Kaggle账户并下载数据。请注意，您不能仅仅是执行 wget ，因为您必须登录 Kaggle。登录 Kaggle，下载数据并将数据传输至设备。

解压 dogvscat.zip，并执行 zip 文件中的dogvscat.sh脚本。脚本如下所示：

#!/usr/bin/env sh
CAFFE_ROOT=/path/to/caffe
mkdir dogvscat
DOG_VS_CAT_FOLDER=/path/to/dogvscat

cd $DOG_VS_CAT_FOLDER
## Download datasets (requires first a login)
#https://www.kaggle.com/c/dogs-vs-cats/download/train.zip
#https://www.kaggle.com/c/dogs-vs-cats/download/test1.zip

# Unzip train and test data
sudo apt-get -y install unzip
unzip train.zip -d .
unzip test1.zip -d .

# Format data
python create_label_file.py # creates 2 text files with labels for training and validation
./build_datasets.sh # build lmdbs

# Download ImageNet pretrained weights (takes ~20 min)
$CAFFE_ROOT/scripts/download_model_binary.py $CAFFE_ROOT/models/bvlc_reference_caffenet

# Fine-tune weights in the AlexNet architecture (takes ~100 min)
$CAFFE_ROOT/build/tools/caffe train -solver $DOG_VS_CAT_FOLDER/dogvscat_solver.prototxt
    -weights $CAFFE_ROOT/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

# Classify test dataset
cd $DOGVSCAT_FOLDER
python convert_binaryproto2npy.py
python dogvscat_classify.py # Returns prediction.txt (takes ~30 min)

# A better approach is to train five AlexNets w/init parameters from the same distribution,
# fine-tune those five, and compute the average of the five networks

我将结果提交至 Kaggle，准确度为 0.97566（在全部 215 个结果中排名第 15 位）。

`PASCAL VOC 分类`

解压voc2012.zip并执行voc2012.sh脚本（在 zip 文件中，如下方所示）。输入sudo chmod 700 *.sh以确保脚本得到执行。它可以训练并运行 AlexNet。

#!/usr/bin/env sh

# Copy and unzip voc2012.zip (it contains this file) then run this file. But first
#  change paths in: voc2012.sh; build_datasets.sh; solvers/*; nets/*; classify.py

# As you run various files, you can ignore the following error if it shows up:
#  libdc1394 error: Failed to initialize libdc1394

# set Caffe root directory
CAFFE_ROOT=$CAFFE_ROOT
VOC=/path/to/voc2012

chmod 700 *.sh

# Download datasets
# Details: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#devkit
if [ ! -f VOCtrainval_11-May-2012.tar ]; then
  wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
fi
# VOCtraival_11-May-2012.tar contains the VOC folder with:
#  JPGImages: all jpg images
#  Annotations: objects and corresponding bounding box/pose/truncated/occluded per jpg
#  ImageSets: breaks the images by the type of task they are used for
#  SegmentationClass and SegmentationObject: segmented images (duplicate directories)
tar -xvf VOCtrainval_11-May-2012.tar

# Run Python scripts to create labeled text files
python create_labeled_txt_file.py

# Execute shell script to create training and validation lmdbs
# Note that lmdbs directories w/the same name cannot exist prior to creating them
./build_datasets.sh

# Execute following command to download caffenet pre-trained weights (takes ~20 min)
#  if weights exist already then the command is ignored
$CAFFE_ROOT/scripts/download_model_binary.py $CAFFE_ROOT/models/bvlc_reference_caffenet

# Fine-tune weights in the AlexNet architecture (takes ~60 min)
# you can also chose one of six solvers: pascal_solver[1-6].prototxt
$CAFFE_ROOT/build/tools/caffe train -solver $VOC/solvers/voc2012_solver.prototxt
  -weights $CAFFE_ROOT/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

# The lines below are not really needed; they served as examples on how to do some tasks

# Test against voc2012_val_lmbd dataset (name of lmdb is the model under PHASE: test)
 $CAFFE_ROOT/build/tools/caffe test -model $VOC/nets/voc2012_train_val_ft678.prototxt
   -weights $VOC/weights_iter_5000.caffemodel -iterations 116

# Classify validation dataset: returns a file w/the labels of the val dataset
#  but it doesn't report accuracy (that would require some adjusting of the code)
python convert_binaryproto2npy.py
mkdir results
python cls_confidence.py
python average_precision.py

# Note to submit results to the VOC scoreboard, retrain NN using the trainval set
# and test on the unlabeled test data provided by VOC

# A better approach is to train five CNNs w/init parameters from the same distribution,
# fine-tune those five, and compute the average of the five networks

更多 VOC 信息（便于读者了解关于 VOC 的更多信息）：

PASCAL VOC 数据集
比较方法或设计选择
使用全部 VOC2007 数据，所有注释（包括测试注释）均可用
使用 VOC2012 "trainval"单独设置来报告交叉验证结果（2008 年到 2012 年未提供测试注释）
最常见的指标为平均准确率 (AP)：准确率/撤销曲线下方的区域
VOC 2012 数据总结
2008 年有一套全新的数据集，而且每年都会添加更多的数据。因此，我们会经常看到 VOC2007 和 VOC2012 中公布的结果（或 VOC2011-- 2011 年和 2012 年之间没有分类和检测任务的额外数据）
20 个类
训练： 5,717 张图像，13,609 个对象
验证： 5,823 张图像，13,841 个对象
测试： 10,991 张图像

`当前 Caffe 用例`

列出了一小部分 Caffe 用例如欲查看更完整的列表，请访问Caffe Model-Zoo。

Ross Girshick 等"Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR，2014 年。代码
Ross Girshick, "Fast R-CNN." ICCV，2015 年。代码
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, "Faster R-CNN: towards real-time object detection." NIPS， 2015 年。代码
Jonathan Long, Evan Shelhamer, Trevor Darrell, "Fully convolutional networks for semantic segmentation." CVPR， 2015 年。

`扩展阅读`

Caffe 主页
Soumith Chintala, "Intel are CPU magicians." 2015 年 10 月
Praddep Dubbey, "Myth Busted: General Purpose CPUs Can’t Tackle Deep Neural Network Training." 2015 年 10 月
Dipankar Das 等"Distributed Deep Learning Using Synchronous Stochastic Gradient Descent." 2016 年 2 月
Yann LeCun, Yoshua Bengio and Geoffrey Hinton, "Deep Learning." Nature. 2015 年 5 月
Ian Goodfellow, Yoshua Bengio and Aaron Courville, "Deep Learning."麻省理工学院出版社，2016 年。
Jeff Donahue, "Sequences in Caffe." CVPR Tutorial，2015 年 6 月
Andrej Karpathy, "Caffe Tutorial." Stanford CS 231n，2015 年
Xinlei Chen, "Caffe Tutorial."卡内基梅隆大学 16824，2015 年
Andrej Karpathy, "The Unreasonable Effectiveness of Recurrent Neural Networks", 2015 年 5 月
Oriol Vinyals 等 "Show and Tell: A Neural Image Caption Generator." CVPR，2015 年 6 月
Wei Hu 等"Deep convolutional neural networks for hyperspectral image classification." Journal of Sensors，2015 年
Clarifai 演示：从 URL 挑选一张图像或一段视频或者提交自己的内容
MIT 场景识别演示：从 URL 挑选一张场景图像或者提交自己的内容

总结

安装

数据层

数据