This is the official implementation of the Learning By Doing NeurIPS 2021 Competition!
- To promote work at the intersection of control theory, reinforcement learning, and causality!
- To allow others to try out new techniques with our competition data!
- To help future competitions!
- To enshrine our prize winners! π
Once you've gotten everything installed, you can easily replicate the final results of the competition! π
Track CHEM π§βπ¬: python run_chem_winners.py
Team | Score | Rank |
---|---|---|
Ajoo | 0.0890 | 1 π |
TeamQ | 0.3385 | 2 |
GuineaPig | 0.3386 | 3 |
Track ROBO π€: python run_robo_winners.py
Team | Score | Rank |
---|---|---|
Ajoo | 0.918 | 1 π |
TeamQ | 16.121 | 2 |
Factored.ai (jmunozb) | 29.539 | 3 |
Congratulations to our competition prize winners! We also thank teams Ajoo, TeamQ, and KC988 for discussing their solutions at our NeurIPS 2021 session!
Track CHEM π§βπ¬
Feature | Status |
---|---|
Competition systems | β |
Final phase evaluation | β |
Prize winner submissions | β |
Starter kit | β |
Training data | β |
Training data generation script | β |
Prize winner implementations | π |
Intermediate phase evaluation | β |
Codalab bundle infrastructure | β |
Submission evaluation using Docker | β |
Track ROBO π€
Feature | Status |
---|---|
Competition systems | β |
Final phase evaluation | β |
Prize winner controllers | β |
Starter controller | β |
Zero baseline controller | β |
Training trajectories | β |
Target trajectories | β |
Oracle controller | β |
Training trajectories generation script | β |
Target trajectories generation script | β |
Intermediate phase evaluation | β |
Codalab bundle infrastructure | β |
Controller evaluation using Docker | β |
Legend:
- β : released!
- π: available at link below
- β³: not yet available -- currently in progress
- β: not planned (but please contact us if you need it!)
We'd love to hear your feedback for any ways we can improve upon our ideas for future competitions at the intersection of control theory, reinforcement learning, and causality!
Tell us here: LearningByDoing AT math DOT ku DOT dk
Thank you, and keep Learning By Doing!
This implementation was tested using Python3.7. Different versions of Python3 should work; please let us know if you have difficulty. The setup.py
file specifies the same version of the libraries used on Codalab.
Optionally, first create a Python3 virtualenv, to which the LBD package will be installed.
These instructions use a default virtualenv directory of ~/envs/
.
virtualenv -p /usr/bin/python3 ~/envs/lbd
source ~/envs/lbd/bin/activate
git clone [email protected]:LearningByDoingCompetition/learningbydoing-comp-staging.git
cd learningbydoing-comp-staging/
pip install -e .
To run the tests for this please, run the following:
pytest tests/
To run the final phase of Track CHEM, run the following:
python run_chem_winners.py
Example output (evaluation time may differ):
LearningByDoing: Track CHEM Final Results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rank 1: Ajoo
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 12/12 [00:49<00:00, 4.14s/it]
Results:
Loss: 0.0890
Duration (seconds): 49.74
Rank 2: TeamQ
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 12/12 [00:46<00:00, 3.86s/it]
Results:
Loss: 0.3385
Duration (seconds): 46.33
Rank 3: GuineaPig
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 12/12 [00:45<00:00, 3.76s/it]
Results:
Loss: 0.3386
Duration (seconds): 45.17
To run the final phase of Track ROBO, run the following:
python run_robo_winners.py
Example output (evaluation time may differ):
LearningByDoing: Track ROBO Final Results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rank 1: Ajoo
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 240/240 [01:17<00:00, 3.12it/s]
Loss: 0.918
Controller-Runtime: 28.4
Rank 2: TeamQ
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 240/240 [02:30<00:00, 1.60it/s]
Loss: 16.121
Controller-Runtime: 87.5
Rank 3: jmunozb
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 240/240 [02:23<00:00, 1.68it/s]
Loss: 29.539
Controller-Runtime: 51.9
Submissions for Track CHEM are evaluated using the scripts/run_chem_evaluation.py
script. The following example evaluates Team Ajoo's CHEM submission:
python scripts/run_chem_evaluation.py -i data/CHEM/prize_winners/1_Ajoo/ -v
Example output (evaluation time may differ):
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 12/12 [00:50<00:00, 4.19s/it]
Results:
Loss: 0.0890
Duration (seconds): 50.32
Training data are generated for Track CHEM using the scripts/gen_chem_training_data.py
script. For example, the following command generates and saves training data to the directory output/chem_train/
as a zip file (-z
):
python scripts/gen_chem_training_data.py -o output/chem_train/ -z
Controllers for Track ROBO are evaluated using the scripts/run_robo_controller.py
script. The following example evaluates the starter kit controller:
python scripts/run_robo_controller.py -i data/ROBO/controllers/starter_kit/controller/ -t data/ROBO/target_trajectories/
Example output (evaluation time may differ):
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 240/240 [01:32<00:00, 2.60it/s]
Loss: 67.156
Controller-Runtime: 32.5
Training data are generated for Track ROBO using the scripts/gen_robo_training_data.py
script. For example, the following command generates and saves training trajectories to the directory output/robo_train/
while saving debug animations (-d
) to output/robo_train/debug/
:
python scripts/gen_robo_training_data.py -o output/robo_train/ -d
The script will default to 50 trajectories per robot system, although this setting can be changed through the -n
argument.
Debug animations are informative but increase script runtime. The -d
argument is optional and thus doesn't need to be specified in order to generate the training data more quickly.
Track ROBO's test data are generated using the scripts/gen_robo_test_data.py
script. The following example generates and saves target trajectories to the directory output/robo_test/
while saving debug animations (-d
) to output/robo_test/debug/
:
python scripts/gen_robo_test_data.py -o output/robo_test/ -d
The target trajectories are a mix of curved trajectories, straight-line traversals, circular arcs, rectangular trajectories, and the letters "L", "B", and "D" that spell out the competition!
Similar to the training data script, the debug argument (-d
) is optional and can be removed to generate the target trajectories more quickly.
This repository contains the prize winner solutions. For Track CHEM, these are the csv submission files. For Track ROBO, the submissions are controllers.
Below are the implementations from our prize winners. Specifically, implementations for Track CHEM winners are available below, as our repository only contains the csv submissions.
Ajoo:
TeamQ:
Factored.ai (jmunozb):
GuineaPig:
The Learning By Doing NeurIPS 2021 Competition Standalone Code is licensed under the GNU Affero General Public License v3.0 (AGPLv3). Please see the license described in the LICENSE
file for more information.
@inproceedings{weichwald2022learning,
title={{Learning by Doing: Controlling a Dynamical System using Causality, Control, and Reinforcement Learning}},
author={Sebastian Weichwald and SΓΈren Wengel Mogensen and Tabitha Edith Lee and Dominik Baumann and Oliver Kroemer and Isabelle Guyon and Sebastian Trimpe and Jonas Peters and Niklas Pfister},
booktitle={Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track},
pages={246--258},
year={2022},
volume={176},
series={Proceedings of Machine Learning Research},
month={06--14 Dec},
publisher={PMLR},
url={https://proceedings.mlr.press/v176/weichwald22a.html},
}