Skip to content

zjyuan1208/CPC_AM_master

Repository files navigation

RL-CSL: A Combinatorial Optimization Method Using Reinforcement Learning and Contrastive Self-supervised Learning

Learning to solve the Travelling Salesman Problem (TSP) and the Vehicle Routing Problem (VRP).

Dependencies

Start

For training TSP instances with 20 nodes and using rollout as REINFORCE baseline:

python run.py --graph_size 20 --baseline rollout --run_name 'tsp20_rollout'

Usage

Generating data

Training data is generated on the fly. To generate validation and test data (same as used in the paper) for all problems:

python generate_data.py --problem all --name validation --seed 4321
python generate_data.py --problem all --name test --seed 1234

Training

For training TSP instances with 20 nodes and using rollout as REINFORCE baseline and using the generated validation set:

python run.py --graph_size 20 --baseline rollout --run_name 'tsp20_rollout' --val_dataset data/tsp/tsp20_validation_seed4321.pkl

Multiple GPUs

By default, training will happen on all available GPUs. To disable CUDA at all, add the flag --no_cuda. Set the environment variable CUDA_VISIBLE_DEVICES to only use specific GPUs:

CUDA_VISIBLE_DEVICES=2,3 python run.py 

Note that using multiple GPUs has limited efficiency for small problem sizes (up to 50 nodes).

Warm start

You can initialize a run using a pretrained model by using the --load_path option:

python run.py --graph_size 100 --load_path pretrain/tsp_100/epoch-final.pt

The --load_path option can also be used to load an earlier run, in which case also the optimizer state will be loaded:

python run.py --graph_size 20 --load_path 'outputs/tsp_20/tsp20_rollout_{datetime}/epoch-0.pt'

The --resume option can be used instead of the --load_path option, which will try to resume the run, e.g. load additionally the baseline state, set the current epoch/step counter and set the random number generator state.

Evaluation

To evaluate a model, you can add the --eval-only flag to run.py, or use eval.py, which will additionally measure timing and save the results:

python eval.py data/tsp/tsp20_test_seed1234.pkl --model pretrain/tsp_20 --decode_strategy greedy

If the epoch is not specified, by default the last one in the folder will be used.

Sampling

To report the best of 1280 sampled solutions, use

python eval.py data/tsp/tsp20_test_seed1234.pkl --model pretrain/tsp_20 --decode_strategy sample --width 1280 --eval_batch_size 1

Beam Search (not in the paper) is also recently added and can be used using --decode_strategy bs --width {beam_size}.

To run baselines

Baselines for different problems are within the corresponding folders and can be ran (on multiple datasets at once) as follows

python -m problems.tsp.tsp_baseline farthest_insertion data/tsp/tsp20_test_seed1234.pkl data/tsp/tsp50_test_seed1234.pkl data/tsp/tsp100_test_seed1234.pkl

To run baselines, you need to install Compass by running the install_compass.sh script from within the problems/op directory and Concorde using the install_concorde.sh script from within problems/tsp. LKH3 should be automatically downloaded and installed when required. To use Gurobi, obtain a (free academic) license and follow the installation instructions.

Pre-train

The pre-trained model is in the "pretrain" file.

Other options and help

python run.py -h
python eval.py -h

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published