Skip to content
This repository has been archived by the owner on Aug 5, 2022. It is now read-only.

Multinode cifar10

Sylwester edited this page Oct 27, 2016 · 10 revisions

Multi-node CIFAR10

This is a part of Multi-node guide. It is recommended to start from there. It is assumed that you have already completed the compilation tutorial.

Introduction

This tutorial explains basics behind distributed training using Intel® Distribution of Caffe*. The cifar10 example is a good warm-up. Before you start distributed training on bigger networks, please make sure to try it. First, please complete the single node tutorial for cifar10 at http://caffe.berkeleyvision.org/gathered/examples/cifar10.html. The multi-node distributed training is an extension of the single-node capabilities therfore the knowledge of single node is necessary to understand what is going on.

Enabling multi-node

To communicate with each other, nodes use MPI (Message Passing Interface). Launching Intel® Distribution of Caffe* in the distributed training mode is as simple as adding command line option: --param_server=mpi when running with mpirun or mpiexec. If this option is not specified, all of the instances will use the single-node mode individually.

Training

When you are done testing single-node, you are ready to try multi-node training on a single machine.

"Multi-node" on a single machine

The example here is prepared for an easy start, therefore it will be run on a single machine spawning multiple processes just like they would on a cluster.

Please try some MPI tutorials before proceeding, i.e. from http://mpitutorial.com/tutorials/mpi-hello-world/.

The basic scenario works like this:

mpirun -host localhost -n <NUMBER_OF_PROCESSES> /path/to/built/tools/caffe train --solver=/path/to/proto --param_server=mpi

The mpirun (mpiexec) executes N (-n) processes on a comma separated list of hosts (-host). The process it executes multiple times is given as arguments together with command line options (very similar like running Valgrind or GDB). Therefore, /path/to/built/tools/caffe train --solver=/path/to/proto --param_server=mpi will execute <NUMBER_OF_PROCESSES> times on a single local host. When the flag --param_server=mpi is passed to Intel® Distribution of Caffe* in this form, the train function will use all available processes in MPI COMM_WORLD to train in multi-node.

An example prepared in examples/cifar10/train_full_multinode_mpi.sh looks as follows:

TOOLS=./build/tools

echo "executing 4 nodes with mpirun"

OMP_NUM_THREADS=1 \
mpirun -l -host 127.0.0.1 -n 4 \
$TOOLS/caffe train --solver=examples/cifar10/cifar10_full_solver.prototxt --param_server=mpi

There is no need to change the solver protobuf configuration.

This script runs four processes (one thread per process) using the same database and each of them in each iteration draws a random batch. Each node uses the same protobuf configuration, so each will retrieve the same number of images to create batch for given iteration. Data is accessed in parallel and images are randomized only if shuffle: true is specified in solver protobuf in data_param section of the data layer for the TRAIN phase. The gradients from all processes are accumulated on the (root) node with rank 0 and averaged. What it means, is that we calculate gradients for total batch size of 400 images (by default the configuration has batch size 100). You can also shuffle the data yourself in a unique way or split the data into disjoint subsets.

Now, you can go back to the main multi-node guide and continue or try the googlenet tutorial linked in the practical part of the guide.


*Other names and brands may be claimed as the property of others

Clone this wiki locally