Skip to content
This repository has been archived by the owner on Aug 5, 2022. It is now read-only.

How to create ImageNet LMDB

Feng Zou edited this page Nov 17, 2017 · 10 revisions

LMDB (Lightning Memory – Mapped Database) is a key-value store database, supported by Intel distribution of Caffe*. One of the most advantage of this solution is its high-throughput. Trainings and validation datasets can be converted to the form stored in the LMDB.

##General scenario and parameters description Intel distribution of Caffe provides script supporting users with creation of LMDB.

General steps:

  1. Download training and validation images from http://image-net.org, after signing up. Each type of files should be stored separately.
  2. Execute the script for download auxiliary data:
$./data/ilsvrc12/get_ilsvrc_aux.sh
  1. If necessary, perform pre-processing of the training/validation data (e.g., for the images resize height/width).
  2. Create LMDB with the script as below:
$examples/imagenet/create_imagenet.sh

Before run, please verify following parameters of the script:

  • TRAIN_DATA_ROOT and VAL_DATA_ROOT variables point to the path of the training and validation data
  • resize_height – the height of the image will be resized according to this - resize_width – the width of the image will be resized according to this value - shuffle – if set, during creating LMDB database, entries will be mixed (the order of the entries will be random) - encoded – if true the LMDB will be compressed - $DATA/train.txt or $DATA/val.txt – text file indicates a classification of the images used to training or validation. - $EXAMPLE/ilsvrc12_train_lmdb or $EXAMPLE/ilsvrc12_val_lmdb – the path to the location where LMDB will be saved
  1. Use the created LMDB in the Intel distribution of Caffe.

##Example execution:

For this guide purposes, examples illustrate this point by importing training and validation data from the ImageNet.

  1. Download ImageNet training and validation data.
  2. Navigate to the imagenet directory, e.g., cd path/to/caffe/examples/imagenet
  3. Edit the create_imagenet.sh script, which should contain the following:
TRAIN_DATA_ROOT=/data/imagenet/train/
VAL_DATA_ROOT=/data/imagenet/val/
RESIZE=true
...
ENCODE=true
...
  1. Run the script, e.g: ./examples/imagenet/create_imagenet.sh

Results of the script run above:

Creating training lmdb...
...
Creating val lmdb...
I1124 10:58:44.212462 193703 convert_imageset.cpp:123] Shuffling data
I1124 10:58:44.219236 193703 convert_imageset.cpp:126] A total of 50000 images.
I1124 10:58:44.219633 193703 db_lmdb.cpp:72] Opened lmdb examples/imagenet/ilsvrc12_val_lmdb
I1124 10:58:51.641278 193703 convert_imageset.cpp:184] Processed 1000 files.
I1124 10:58:58.952800 193703 convert_imageset.cpp:184] Processed 2000 files.
I1124 10:59:05.942912 193703 convert_imageset.cpp:184] Processed 3000 files.
...
Done.
  1. The ilsvrc12_train_lmdb and ilsvrc12_val_lmdb directory should be created by the script, in the path according to the set EXAMPLE variable.
  2. Update the .prototxt file of the particular model using in the Intel distribution of Caffe, e.g.,
data_param {
   source: "examples/imagenet/ilsvrc12_train_lmdb"
   batch_size: 256
   backend: LMDB
 }

*Other names and brands may be claimed as the property of others

Clone this wiki locally