GeDa is a Python package that helps you to Get the Data for your project easily.
pip install geda
from geda.data_providers.voc import VOCSemanticSegmentationDataProvider
root = "<directory>/<to>/<store>/<data>" # e.g. "data/VOC"
dataprovider = VOCSemanticSegmentationDataProvider(root)
dataprovider.get_data()
from geda import get_data
root = "<directory>/<to>/<store>/<data>" # e.g. "data/VOC"
dataprovider = get_data(name="VOC_SemanticSegmentation", root=root)
dataprovider.get_data()
The
get_data
function currently supported names:MNIST
,DUTS
,NYUDv2
,VOC_InstanceSegmentation
,VOC_SemanticSegmentation
,VOC_PersonPartSegmentation
,VOC_Main
,VOC_Action
,VOC_Layout
,MPII
,COCO_Keypoints
By using dataprovider.get_data()
functionality, the data is subjected to the following pipeline:
- Download the data from source (specified by the
_URLS
variable in each module) - Unzip the files if needed (in case of
tar
,zip
orgz
files downloaded) - Move the files to
<root>/raw
directory - Find the split ids (file basenames or indices - depending on the dataset)
- Arrange files, i.e. move (or copy) files from
<root>/raw
directory to task-specific directories - [Optional] Create labels in specific format (f.e. YOLO)
Resulting directory structure of the get_data(name="VOC_SemanticSegmentation", root="data/VOC")
.
└── data
└── VOC
├── raw
│ ├── Annotations
│ ├── ImageSets
│ ├── JPEGImages
│ ├── SegmentationClass
│ └── SegmentationObject
├── SegmentationClass
│ ├── annots
│ ├── images
│ ├── labels
│ └── masks
└── trainval_2012.tar
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.