Name		Name	Last commit message	Last commit date
parent directory ..
direct-prompt-models		direct-prompt-models
foundation-models		foundation-models
llmp-models		llmp-models
multimodal-models		multimodal-models
statistical-models		statistical-models
README.md		README.md
make_summary.sh		make_summary.sh

README.md

Experiments

This readme gives an overview of the main job launching components.

run_baselines.py

This is a command line tool to run experiments with baselines and produce reports.

Experimenting with a baseline

This command line tools executes experiments specified in json files.
The Python file specifies experiment functions, each named with experiment_<method>, where <method> is the method name to be used in json files.
Methods can take arguments, which are also specified in json.
Here's an example that first runs GPT-4o mini (without context) and then Lag-Llama (sequentially).

[
    {"label": "GPT-4o-mini (no ctx)", "method": "gpt", "llm": "gpt-4o-mini", "use_context": false},
    {"label": "Lag-Llama", "method": "lag_llama"}
]

Such a file can be passed to the command via the exp-spec argument, i.e., python run_baselines.py --exp-spec myfile.json
The command line tool also allows to specify the output directory (for plots) and the number of samples to be drawn from the models.

Summary table

To generate a summary table with the results of multiple methods, simply make a json file with all the experiments you care about and call the command line tool. It will iterate through every single task/seed combination for all experiments (should be fast due to caching) and produce the final table. Note that you can add the --skip-cache-miss argument to skip any result that isn't currently computed (e.g., GPT failed due to some error but you don't want to re-query it right now).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments

experiments

README.md

Experiments

run_baselines.py

Experimenting with a baseline

Summary table

Files

experiments

Directory actions

More options

Directory actions

More options

Latest commit

History

experiments

Folders and files

parent directory

README.md

Experiments

run_baselines.py

Experimenting with a baseline

Summary table