Skip to content

runtraining.sh

Chris Churas edited this page Apr 27, 2018 · 11 revisions

This script runs CDeep3M training to generated what is known as a trained model. In the case of CDeep3M we are actually training 3 separate models which will be described below.

This script is actually a wrapper that invokes CreateTrainJob.m and run_all_train.sh

Usage:

usage: runtraining.sh [-h] [--1fmonly] [--numiterations NUMITERATIONS]
                              [--base_lr BASE_LR] [--power POWER] 
                              [--momentum MOMENTUM] 
                              [--weight_decay WEIGHT_DECAY] 
                              [--average_loss AVERAGE_LOSS] 
                              [--lr_policy POLICY] [--iter_size ITER_SIZE] 
                              [--snapshot_interval SNAPSHOT_INTERVAL]
                              augtrainimages trainoutdir

              Version: 0.20.0

              Trains Deep3M model using caffe with training data
              passed into script. 

              For further information about parameters below please see: 
              https://github.com/BVLC/caffe/wiki/Solver-Prototxt

    
positional arguments:
  augtrainimages       Augmented training data from PreprocessTrainingData.m
  trainoutdir          Desired output directory

optional arguments:
  -h, --help           show this help message and exit
  --1fmonly            Only train 1fm model
  --base_learn         Base learning rate (default 1e-02)
  --power              Used in poly and sigmoid lr_policies. (default 0.8)
  --momentum           Indicates how much of the previous weight will be 
                       retained in the new calculation. (default 0.9)
  --weight_decay       Factor of (regularization) penalization of large
                       weights (default 0.0005)
  --average_loss       Number of iterations to use to average loss
                       (default 16)
  --lr_policy          Learning rate policy (default poly)
  --iter_size          Accumulate gradients across batches through the 
                       iter_size solver field. (default 8)
  --snapshot_interval  How often caffe should output a model and solverstate.
                       (default 2000)
  --numiterations      Number of training iterations to run (default 30000)

This script will create a new directory, denoted as trainoutdir in usage above, which will be structured as follows:

Tree view of directory showing only base files and directories

├── 1fm
│   ├── log
│   ├── trainedmodel
├── 3fm
│   ├── log
│   ├── trainedmodel
├── 5fm
│   ├── log
│   ├── trainedmodel
├── caffe_train.sh
├── parallel.jobs
├── readme.txt
├── run_all_train.sh
└── train_file.txt

1fm, 3fm, 5fm

These directories contain the trained models and each one has an identical structure as seen here with actual files:

├── #fm
│   ├── deploy.prototxt
│   ├── label_class_selection.prototxt
│   ├── log
│   │   ├── caffe.bin.INFO
│   │   ├── caffe.bin.ip-XXX.ubuntu.log.INFO.XXXX
│   │   └── out.log
│   ├── solver.prototxt
│   ├── trainedmodel
│   │   ├── 1fm_classifer_iter_###.caffemodel
│   │   └── 1fm_classifer_iter_###.solverstate
│   ├── train_file.txt
│   ├── train_val.prototxt
│   └── valid_file.txt

The actual trained model resides under #fm/trainedmodel in the .caffemodel file. The other file .solverstate is needed to resume training, but not needed for prediction. The ### in the .caffemodel and .solverstate denote the iteration of the model file. As caffe trains multiple .caffemodel files will be output so in this #fm/trainedmodel can exist multiple files at different iterations of completion.

Clone this wiki locally