Melting Pot Contest @ AI3617-2024-Spring

Official Repository for Melting Pot Contest Experiments

OPTIONS:
  -h, --help            show this help message and exit
  --num_workers NUM_WORKERS
                        Number of workers to use for sample collection. Setting it zero will use same worker for collection and model training.
  --num_gpus NUM_GPUS   Number of GPUs to run on (can be a fraction)
  --local               If enabled, init ray in local mode.
  --no-tune             If enabled, no hyper-parameter tuning.
  --algo {ppo}          Algorithm to train agents.
  --framework {tf,torch}
                        The DL framework specifier (tf2 eager is not supported).
  --exp {pd_arena,al_harvest,clean_up,territory_rooms}
                        Name of the substrate to run
  --seed SEED           Seed to run
  --results_dir RESULTS_DIR
                        Name of the wandb group
  --logging {DEBUG,INFO,WARN,ERROR}
                        The level of training and data flow messages to print.
  --wandb WANDB         Whether to use WanDB logging.
  --downsample DOWNSAMPLE
                        Whether to downsample substrates in MeltingPot. Defaults to 8.
  --as-test             Whether this script should be run as a test.

For torch backend, you may need to prepend the above command with CUDA_VISIBLE_DEVICE=[DEVICE IDs] if your algorithm does not seem to find GPU when enabled.

We recommend to use torch backend.

Run Evaluation

python baselines/evaluation/evaluate.py [OPTIONS]

OPTIONS:
  -h, --help            show this help message and exit
  --num_episodes NUM_EPISODES
                        Number of episodes to run evaluation
  --eval_on_scenario EVAL_ON_SCENARIO
                        Whether to evaluate on scenario. If this is False, evaluation is done on substrate
  --scenario SCENARIO   Name of the scenario. This cannot be None when eval_on_scenario is set to True.
  --config_dir CONFIG_DIR
                        Directory where your experiment config (params.json) is located
  --policies_dir POLICIES_DIR
                        Directory where your trained policies are located
  --create_videos CREATE_VIDEOS
                        Whether to create evaluation videos
  --video_dir VIDEO_DIR
                        Directory where you want to store evaluation videos

Code Structure

.
├── meltingpot          # A forked version of meltingpot used to train and test the baselines
├── setup.py            # Contains all the information about dependencies required to be installed
└── baselines           # Baseline code to train RLLib agents
    ├── customs         # Add custom policies and metrics here
    |── evaluation      # Evaluate trained models on substrate and scenarios locally
    ├── models          # Add models not registered in Rllib here
    |── tests           # Unit tests to test environment and training
    ├── train           # All codes related to training baselines
      |__configs.py     # Modify model and policy configs in this file
    |── wrappers        # Example code to write wrappers around your environment for added functionality

How to Guide

Visualization

How to render trained models?

python baselines/train/render_models.py [OPTIONS]

OPTIONS:
  -h, --help            show this help message and exit
  --config_dir CONFIG_DIR
                        Directory where your experiment config (params.json) is located
  --policies_dir POLICIES_DIR
                        Directory where your trained policies are located
  --horizon HORIZON     No. of environment timesteps to render models

How to visualize scenario plays?

You can also generate videos of agents behavior in various scenarios during local evaluation. To do this, set create_videos=True and video_dir='<PATH to video directory>' while running evaluation. If eval_on_scenario=False, this will create video plays of evaluation on substrate.

python baselines/evaluation/evaluate.py --create_videos=True --video_dir='' [OPTIONS]

Note: The script for generating these videos is located in VideoSubject class in meltingpot/utils/evaluation/evaluation.py. Modify this class to play with video properties such as codec, fps etc. or use different video writer. If you do not use meltingpot code from this repo, we have found that the generated videos are rendered very tiny. To fix that, add rgb_frame = rgb_frame.repeat(scale, axis=0).repeat(scale, axis=1) after line 88 to extrapolate the image, where we used scale=32.

Logging

You can use either Wandb or Tensorboard to log and visualize your training landscape. The install setup provided includes support for both of them.

WanDB Logging

To setup Wandb:

Create an account on Wandb website
Get the API key from your account and set corresponding environment variable using export WANDB_API_KEY=<Your Key>
Enable Wandb logging during training using python run_ray_train.py --wandb=True

Tensorboard Logging

To visualize your results with TensorBoard, run: tensorboard --logdir <results_dir>

Identified issues with Ray 2.6.1

During our training, we found issues with both tf and torch backends that leads to errors when using default lstm wrapper provided by rllib. Our installation script above provides fix patches ray_patch.sh for the same. But if you use the manual installation approach, the following fixes need to be applied after installation:

For tf users:

In your Python library folder, in the file ray/rllib/policy/sample_batch.py, replace line 636 with the following snippet:

time_lengths = tree.map_structure(lambda x: len(x), data[next(iter(data))])
flattened_lengths = tree.flatten(time_lengths)
assert all(t == flattened_lengths[0] for t in flattened_lengths)
data_len = flattened_lengths[0]

For torch users:

In your Python library folder, in the file ray/rllib/models/torch/complex_input_net.py replace line 181 with:

self.num_outputs = concat_size if not self.post_fc_stack else self.post_fc_stack.num_outputs

Submission Guide of Clean_up

Submit on Canvas
Put your policy and model weights in baselines.
Upload your baselines directory with a README about how to evaluate your policy and your results.
put baselines in clean_up of your entire submission directory.

Other information:

For the evaluation guide, please see the evaluation section.
We'll use baselines/evaluate.py to test your model. So you should not modify this file. If you want to implement a rule-based policy, you can modify baselines/customs /policies.py while making no change in each function's inputs. (If it's necessary for you, you can contact TAs.)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
baselines		baselines
meltingpot		meltingpot
ray_patches		ray_patches
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
ray_patch.sh		ray_patch.sh
setup.py		setup.py

Substrate	Scenarios
clean_up	clean_up_7
prisoners_dilemma_in_the_matrix__repeated	prisoners_dilemma_in_the_matrix__repeated_0
	prisoners_dilemma_in_the_matrix__repeated_1
	prisoners_dilemma_in_the_matrix__repeated_2
	prisoners_dilemma_in_the_matrix__repeated_3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Melting Pot Contest @ AI3617-2024-Spring

Table of Contents

Substrates and Scenarios

Installation Guidelines

MacOS Ventura 13.2.1 and Ubuntu 20.04 LTS

Run Training

Run Evaluation

Code Structure

How to Guide

Visualization

How to render trained models?

How to visualize scenario plays?

Logging

WanDB Logging

Tensorboard Logging

Identified issues with Ray 2.6.1

Submission Guide of Clean_up

About

Releases

Packages

Languages

License

ziyuwan/AI3617-2024-Spring-Melting-Pot-Contest

Folders and files

Latest commit

History

Repository files navigation

Melting Pot Contest @ AI3617-2024-Spring

Table of Contents

Substrates and Scenarios

Installation Guidelines

MacOS Ventura 13.2.1 and Ubuntu 20.04 LTS

Run Training

Run Evaluation

Code Structure

How to Guide

Visualization

How to render trained models?

How to visualize scenario plays?

Logging

WanDB Logging

Tensorboard Logging

Identified issues with Ray 2.6.1

Submission Guide of Clean_up

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages