HyperPRI - Hyperspectral Plant Root Imagery
This Github Repo contains source code used to demonstrate how the hyperspectral data included within the HyperPRI dataset improves binary segmentation performance for a deep learning segmentation model.
==This code's public release is a work-in-progress and will be cleaned up following submissions to bioRxiv and Elsevier's COMPAG journal==
- Oct-15-2023: Initial upload/release of dataset
- Mar-25-2024: Included hyperspectral data for the viewing pane's material (Lexan)
- Jun-21-2024: Set aside rhizobox 40 as test data (dates: Aug-15 and Aug-24)
Preprint: bioRxiv 2023.09.29.559614v3
YouTube: Dataset Video
Data in HyperPRI enhances plant science analyses and provides challenging features for machine learning models.
- Hyperspectral data can supplement root analysis
- Study root traits across time, from seedling to reproductively mature
- Thin object features: 1-3 pixels wide
- High correlation between the high-resolution channels of hyperspectral data
There are a number of related CV tasks for this dataset:
- Compute root characteristics (length, diameter, angle, count, system architecture, hyperspectral)
- Determine root turnover
- Observe drought resiliency and response
- Compare multiple physical and hyperspectral plant traits across time
- Investigate texture analysis techniques
- Segment roots vs. soil
- Hyperspectral Data (400 – 1000 nm, every 2 nm)
- Temporal Data: Imaged for 14 or 15 timesteps across two months
- Drought: Aug-06 to Aug-19, 78 - 91 days after planting (stage R6)
- Drought: Jun-28 to Jul-21, 39 – 62 days after planting (stage V7 - V9)
- Fully-annotated segmentation masks
- Includes annotations for peanut nodules and pegs
- Box weights at each time stamp
- Baseline Measurements: Empty box, dry soil, wet soil
- 32 Peanut (Arachis hypogaea) rhizoboxes – 358 images
- 32 Sweet Corn (Zea mays) rhizoboxes – 390 images
The primary Python packages used are PyTorch, PyTorch Lightning, and related utilities. See the environment.yml
file for specific versions.
-
Create a Conda virtual environment (ideally) with the packages requested in the provided
environment.yml
file.- Additional instructions for using the YAML file may be found on the Conda site.
-
Dataset directory setup. The full directory path from the base
HyperPRI/
repository isDatasets/HyperPRI/
.- If using, place the JSON/CSV splits data in a
Datasets/HyperPRI/data_splits
subdirectory. - Per plant type (eg. Peanut, Sweet Corn), place them in a
Datasets/HyperPRI/{Peanut, SweetCorn}_968x608
subdirectory which hosts 3 of its own subdirectories:hsi_files
,mask_files
, andrgb_files
. As the names suggest, the HSI.dat
and.hdr
files should be inhsi_files
, and the PNG mask/image files should be in themask_files
andrgb_files
subdirectories, respectively.- Please note that the paper only used a
Peanut_968x608
subdirectory.
- Please note that the paper only used a
- If using, place the JSON/CSV splits data in a
-
Across all model training, the following holds:
-
kfold_train.py
: Setstart_split
andnum_splits
to 0 and 5, respectively. Set then_seeds
to the number of training seeds desired per model. -
src/Experiments/params_HyperPRI.py
: Batch Size of 2. Adam optimization with 0.001 LR, standard$\beta$ values, and no weight decay.num_classes=1
(binary).
-
-
For each model, the following architecture parameters were used for the paper. Henceforth, the
.../params_HyperPRI.py
file is referred to as "Parameters":- UNET:
- Parameters:
n_channels=3
. -
kfold_train.py
: Setdataset
equal to"RGB"
. - Everything else should be hardcoded to get ~31.0M parameters.
- Parameters:
- SpectralUNET:
- Parameters:
n_channels=238
,patch_size=(608, 700)
,augment=True
,spectral_bn_size=1650
. Thehsi_lo
andhsi_hi
values should be 25 and 263, respectively. This approximately corresponds to 450nm and 926nm on the EM spectrum. -
kfold_train.py
: Setdataset
equal to"HSI"
. SetMODEL_SHARD
toTrue
. This will require at least 2 GPUs to train due to the size of features when inputting multiple images. If a single GPU is desired, setMODEL_SHARD
toFalse
and decrease the value ofpatch_size
in the Parameters until the training fits.
- Parameters:
- CubeNET-64:
- Parameters:
n_channels=238
,patch_size=(608, 968)
,augment=False
,cube_featmaps=64
. Thehsi_lo
andhsi_hi
values should be 25 and 263, respectively. This approximately corresponds to 450nm and 926nm on the EM spectrum. -
kfold_train.py
: Setdataset
equal to"HSI"
. SetMODEL_SHARD
toFalse
.
- Parameters:
- UNET:
-
Run
kfold_train.py
individually for each set of parameters in the previous step. -
After training is finished, the models should be saved in their respective directories. Provided this is so, the
kfold_validate.py
file is all set-up and prepared for running out of the box. If segmentation maps of the dataset for all three models is requested, change thesegmaps
list to be[True, True, True]
.
For any issues and additional questions, please direct them to changspencer.
Comet-ML/Tensorboard Logging: If certain loggers are undesired, they can be commented out starting in the src/PLTrainer.py > train_net
method. It will be up to the user to trace all places where the logger(s) may be disrupted through removing their instantiation/definition.
SpectralUNET Training: To train SpectralUNET with 1650 neurons in each layer with our coding setup and memory constraints, we had to randomly crop the hyperspectral cubes' height and width to
Validation Data
Metric | UNET | SpectralUNET | CubeNET-64 |
---|---|---|---|
BCE Loss | 0.080 (0.015) | 0.146 (0.022) | 0.077 (0.014) |
DICE | 0.838 (0.015) | 0.717 (0.044) | 0.844 (0.013) |
+IOU | 0.721 (0.022) | 0.561 (0.053) | 0.730 (0.019) |
AP | 0.919 (0.013) | 0.781 (0.048) | 0.923 (0.012) |
Test Data
Metric | UNET | SpectralUNET | CubeNET-64 |
---|---|---|---|
Pix Acc | 0.733 (0.123) | 0.751 (0.114) | 0.898 (0.134) |
DICE | 0.162 (0.053) | 0.161 (0.064) | 0.471 (0.206) |
+IOU | 0.089 (0.031) | 0.089 (0.039) | 0.329 (0.163) |
AP | 0.226 (0.079) | 0.220 (0.083) | 0.610 (0.109) |
Note: Metrics shown are the mean across all splits with standard deviation in parentheses. Dataset splits are described in the JSON files located at the dataset URL above.