-
Notifications
You must be signed in to change notification settings - Fork 491
Multinode cifar10
This is a part of Multi-node guide. It is recommended to start from there. It is assumed that you have already completed the compilation tutorial.
This tutorial explains basics behind distributed training using Intel® Distribution of Caffe*. The cifar10 example is a good warm-up. Before you start distributed training on bigger networks, please make sure to try it. First, please complete the single node tutorial for cifar10 at http://caffe.berkeleyvision.org/gathered/examples/cifar10.html. The multi-node distributed training is an extension of the single-node capabilities therfore the knowledge of single node is necessary to understand what is going on. There you should learn that you need Cifar10 database which you can get by running:
data/cifar10/get_cifar10.sh
examples/cifar10/create_cifar10.sh
which will download the dataset and prepare it for calculation.
To communicate with each other, nodes use MPI (Message Passing Interface). Launching Intel® Distribution of Caffe* in the distributed training mode is as simple as adding command line option: --param_server=mpi
when running with mpirun
or mpiexec
. If this option is not specified, all of the instances will use the single-node mode individually.
When you are done testing single-node, you are ready to try multi-node training on a single machine.
The example here is prepared for an easy start, therefore it will be run on a single machine spawning multiple processes just like they would on a cluster.
Please try some MPI tutorials before proceeding, i.e. from http://mpitutorial.com/tutorials/mpi-hello-world/.
The basic scenario works like this:
mpirun -host localhost -n <NUMBER_OF_PROCESSES> /path/to/built/tools/caffe train --solver=/path/to/proto --param_server=mpi
The mpirun (mpiexec) executes N (-n) processes on a comma separated list of hosts (-host). The process it executes multiple times is given as arguments together with command line options (very similar like running Valgrind or GDB). Therefore, /path/to/built/tools/caffe train --solver=/path/to/proto --param_server=mpi
will execute <NUMBER_OF_PROCESSES>
times on a single local host. When the flag --param_server=mpi
is passed to Intel® Distribution of Caffe* in this form, the train function will use all available processes in MPI COMM_WORLD to train in multi-node.
An example prepared in examples/cifar10/train_full_multinode_mpi.sh
looks as follows:
TOOLS=./build/tools
echo "executing 4 nodes with mpirun"
OMP_NUM_THREADS=1 \
mpirun -l -host 127.0.0.1 -n 4 \
$TOOLS/caffe train --solver=examples/cifar10/cifar10_full_solver.prototxt --param_server=mpi
There is no need to change the solver protobuf configuration.
This script runs four processes (one thread per process) using the same database and each of them in each iteration draws a random batch. Each node uses the same protobuf configuration, so each will retrieve the same number of images to create batch for given iteration. Data is accessed in parallel and images are randomized only if shuffle: true
is specified in solver protobuf in data_param
section of the data layer for the TRAIN
phase. The gradients from all processes are accumulated on the (root) node with rank 0
and averaged. What it means, is that we calculate gradients for total batch size of 400
images (by default the configuration has batch size 100
). You can also shuffle the data yourself in a unique way or split the data into disjoint subsets.
Now, you can go back to the main multi-node guide and continue or try the googlenet tutorial linked in the practical part of the guide.
*Other names and brands may be claimed as the property of others