We use the Perlmutter supercomputer to evaluate the performance of Atlas, and use a single-core CPU (can be a laptop) to evaluate the algorithms Stage and Kernelize.
Atlas is built on Quartz. To run the artifact, please create a Python environment for Quartz and copy the necessary circuits first. This needs to be done on both the single-core CPU and Perlmutter.
- Conda 23.3 or later (
module load conda
on Perlmutter) - CMake 3.18 or later
The following parts will assume that these commands are executed (please replace YOUR_ACCOUNT
with your account name):
# clone this repo
git clone [email protected]:quantum-compiler/atlas-artifact.git --recursive
# Create Python environment
cd atlas-artifact/deps/quartz
conda env create --name quartz --file env.yml
conda activate quartz
pip install matplotlib
# Build Quartz
mkdir build
cd build
cmake ..
# Install the HiGHS solver
cd ../external/HiGHS
mkdir build
cd build
cmake ..
make -j 12
cd ../../../..
export ATLAS_HOME=${The_directory_running_git_clone}/atlas-artifact
export PATH=$PATH:$ATLAS_HOME/deps/quartz/external/HiGHS/build/bin
# The above needs to be done on both the single-core CPU and Perlmutter.
# The following only needs to be done on Perlmutter.
# Replace the account name (not necessary if you are only viewing the results and not reproducing any experiments)
cd ../perlmutter/e2e
python replace_account_name.py
YOUR_ACCOUNT
# (input your account name above)
cd ../..
To plot the existing results in Figure 8:
# in quartz conda environment
cd staging_bench
python ilp_plot.py
cd ..
To run the experiment and reproduce the results (takes ~14 hours on a single-core CPU):
# in quartz conda environment
cd deps/quartz/build
# make test_remove_swap
# ./test_remove_swap
make benchmark_ilp_num_stages
cd ..
# yes | cp -r ../../circuit/* circuit/
./build/benchmark_ilp_num_stages
cp ilp_result.csv ../../staging_bench
cd ../..
To plot the existing results in Figure 9:
# in quartz conda environment
cd kernelization_bench
python dp_plot.py
cd ..
To run the experiment and reproduce the results (takes ~13 hours on a single-core CPU):
# in quartz conda environment
cd deps/quartz/build
make benchmark_dp
./benchmark_dp
cd ..
cp dp_result.csv ../../kernelization_bench
cd ../..
We run the end-to-end experiments on Perlmutter.
To plot the existing results in Figures 5 and 10:
# in quartz conda environment
cd perlmutter/e2e/logs
python plot.py
cd ../../..
Following are the instructions to run the experiment and reproduce the results.
- Set related environment variables in
config/config.linux
(please download cuQuantum if needed). We support two modes for simulation in Atlas. First one is distributed GPU-based simulation (USE_LEGION=OFF
). The other one is CPU-offload enabled simulation (USE_LEGION=ON
), which support simulating more qubits on a single machine. Note that the second mode has not been tested for multi-node execution.
In this section (end-to-end experiments), please make sure that the distributed GPU-based simulation is used (USE_LEGION=OFF
).
In addition, please make sure that the HiGHS solver is installed, the environment variable ATLAS_HOME
is set,
and PATH
is updated to $PATH:$ATLAS_HOME/deps/quartz/external/HiGHS/build/bin
.
- Create a Python 3.8 environment with PuLP (with HiGHS solver), pybind11, and Qiskit (Qiskit is not necessary for the end-to-end experiments but necessary for the DRAM offloading experiments):
conda create --name pulp python=3.8
conda activate pulp
pip install git+https://github.com/coin-or/[email protected]
pip install qiskit pybind11
- Build and install:
# in pulp conda environment
mkdir build
cd build
bash ../config/config.linux
make -j 12
cd ..
- Run the sbatch scripts:
# in pulp conda environment
cd perlmutter/e2e
sbatch srun-1-quartz.sh # takes around 2 minutes (in background)
sbatch srun-2-quartz.sh # takes around 1 minute
sbatch srun-4-quartz.sh # takes around 2 minutes
sbatch srun-8-quartz.sh # takes around 2 minutes
sbatch srun-16-quartz.sh # takes around 2 minutes
# Following are additional experiments
# sbatch srun-32-quartz.sh # takes around 2 minutes
# sbatch srun-64-quartz.sh # takes around 3 minutes
- Please do not build Atlas with different options (
USE_LEGION=OFF
/USE_LEGION=ON
) when the jobs are submitted but still pending. If you submit the jobs withUSE_LEGION=OFF
and then build Atlas withUSE_LEGION=ON
(or vice versa), overwriting the previous executable, for example, the jobs withUSE_LEGION=OFF
may not run properly. - In some rare case (observed just before some maintenance of Perlmutter), the communication time on Perlmutter may take as long as 30 seconds while it should take less than one second. If you observe an unreasonably long running time, please simply run the script again.
- Download our modified HyQuas from the
perlmutter
branch of the Repo:
cd $ATLAS_HOME
cd ..
# It is recommended to let HyQuas and atlas-artifact share the same parent directory.
git clone -b perlmutter https://github.com/caoshiyi/HyQuas --recursive
- Config Env
module load cudatoolkit/11.7
module load nccl/2.15.5
module load gcc/11.2.0
module load cray-mpich/8.1.25
export HYQUAS_ROOT=${The_directory_running_git_clone}/HyQuas
cd ${HYQUAS_ROOT}
bash ./update_cutt_makefile.sh
- Compile the
cutt
lib:
cd ${HYQUAS_ROOT}/third-party/cutt
make -j
- To reproduce the results we displayed in the paper, build HyQuas using:
cd ${HYQUAS_ROOT}/scripts
source ../scripts/init.sh -DBACKEND=mix -DSHOW_SUMMARY=on -DSHOW_SCHEDULE=off -DMICRO_BENCH=on -DUSE_DOUBLE=on -DDISABLE_ASSERT=off -DENABLE_OVERLAP=on -DMEASURE_STAGE=off -DEVALUATOR_PREPROCESS=on -DUSE_MPI=on -DMAT=7
- Use the scripts
*-hyquas.sh
underatlas-artifact/perlmutter/e2e
to run the experiments on different number of GPUs.
# assume HyQuas and atlas-artifact share the same parent directory
cd ../../atlas-artifact/perlmutter/e2e
sbatch srun-1-hyquas.sh # takes around 3 minutes (in background)
sbatch srun-2-hyquas.sh # takes around 2 minutes
sbatch srun-4-hyquas.sh # takes around 2 minutes
sbatch srun-8-hyquas.sh # takes around 3 minutes
sbatch srun-16-hyquas.sh # takes around 3 minutes
# Following are additional experiments
# sbatch srun-32-hyquas.sh # takes around 4 minutes
# sbatch srun-64-hyquas.sh # takes around 10 minutes
- Make sure that the account name in
perlmutter/e2e/cuQuantum.sh
is replaced when running the script at the beginning of this document. - Run:
# conda environment is not necessary
# cd perlmutter/e2e
bash cuQuantum.sh 1 1 28 # takes around 1 minute (in foreground)
bash cuQuantum.sh 1 2 29 # takes around 1 minute
bash cuQuantum.sh 1 4 30 # takes around 2 minutes
bash cuQuantum.sh 2 4 31 # takes around 2 minutes
bash cuQuantum.sh 4 4 32 # takes around 2 minutes
bash cuQuantum.sh 8 4 33 # takes around 2 minutes
bash cuQuantum.sh 16 4 34 # takes around 2 minutes
# Following are additional experiments
# bash cuQuantum.sh 32 4 35 # takes around 2 minutes
# bash cuQuantum.sh 64 4 36 # takes around 4 minutes
- Make sure that the account name in
perlmutter/e2e/Qiskit.sh
is replaced when running the script at the beginning of this document. - Run:
# in quartz conda environment
# cd perlmutter/e2e
bash Qiskit.sh 1 1 28 # takes around 3 minutes (in foreground)
bash Qiskit.sh 1 2 29 # takes around 12 minutes
bash Qiskit.sh 1 4 30 # takes around 47 minutes, recommended to allocate a node first
We run the DRAM offloading experiments on Perlmutter.
To plot the existing results in Figures 6 and 7:
# in quartz conda environment
cd $ATLAS_HOME
cd perlmutter/offload
python plot_offload.py
cd ../..
Following are the instructions to run the experiment and reproduce the results.
- Set related environment variables in
config/config.linux
(settingUSE_LEGION=ON
), and use a Python 3.8 environment with PuLP and Qiskit. - Make sure the
setenv("PYTHONPATH", ...)
inexamples/legion-based/test_sim_legion.cc
is pointing to the correct location. - Build:
cd build
bash ../config/config.linux
make -j 12
cd ../perlmutter/offload
- Either run in interactive mode (please replace
YOUR_ACCOUNT
with your account name):
salloc --nodes 1 -q regular --time 00:30:00 --constraint gpu --gpus-per-node 4 --account=YOUR_ACCOUNT
conda activate pulp && time bash offload.sh && exit # takes around 22 minutes
cd ../..
or run in sbatch:
sbatch srun_offload.sh # takes around 22 minutes in background
cd ../..
- Download and build QDAO v0.1.0 (assuming
qdao/
andatlas-artifact/
share the same parent directory):
cd ..
# at the parent directory of atlas-artifact now
git clone https://github.com/Zhaoyilunnn/qdao.git
cd qdao
git checkout tags/v0.1.0
conda create --name qdao python=3.9
conda activate qdao
pip install .
- Comment out this line in the QDAO directory you just downloaded, and append these two lines after this line:
self._sim.set_options(blocking_enable=True)
self._sim.set_options(blocking_qubits=28)
- Copy the scripts to QDAO directory and run in interactive mode (please replace
YOUR_ACCOUNT
with your account name):
cp ../atlas-artifact/perlmutter/offload/run_qdao.py .
cp ../atlas-artifact/perlmutter/offload/run_qdao.sh .
salloc --nodes 1 -q regular --time 01:20:00 --constraint gpu --gpus-per-node 4 --account=YOUR_ACCOUNT
conda activate qdao
LD_LIBRARY_PATH="" time bash run_qdao.sh 0 && exit # takes around 60 minutes
The results are stored in qdao/logs
.
- Copy the results back:
cp logs/* ../atlas-artifact/perlmutter/offload/logs/qdao-qiskit
We include all circuits used in evaluation in this repository so there is no need to generate them again. These instructions are only for your information.
MQT Bench:
- Download the circuits from https://www.cda.cit.tum.de/mqtbench/. Choose scalable benchmarks and "Target-independent Level"->"Qiskit".
- Copy the circuits to
$ATLAS_HOME/circuit/MQTBench_${number_of_qubits}q
and replace the SWAP gates with logical qubit swaps because some previous work does not support SWAP gates, and this replacement does not affect the result:
(You may need to edit $ATLAS_HOME/deps/quartz/src/test/test_remove_swap.cpp
as needed if you use other circuits that
are not in this repository.)
cd deps/quartz/build
make test_remove_swap
./test_remove_swap
NWQBench:
- Clone the repository:
git clone [email protected]:pnnl/nwqbench.git
cd nwqbench
git checkout 3c322b789f5a26636d368253817c8d3f4676ae52 # optional
- Manually remove the line "pip==21.1.2" from requirements.txt (optional) and then follow the instructions of NWQBench
to install it (i.e.,
pip install -r requirements.txt
). - Run the following commands to generate the
qsvm
circuits:
cd NWQ_Bench
cd qsvm
python qsvm_raw.py 28
python qsvm_raw.py 29
# The 30-qubit circuit already exists
python qsvm_raw.py 31
python qsvm_raw.py 32
python qsvm_raw.py 33
python qsvm_raw.py 34
python qsvm_raw.py 35
python qsvm_raw.py 36
python qsvm_raw.py 37
python qsvm_raw.py 38
python qsvm_raw.py 42
Similar for ising
and ising.py
, vqc
and vqc_raw.py
.
-
The result circuits are in the folder
(path/to/nwqbench)/NWQ_Bench/qsvm/qasm/
(and similar for others). -
Copy the circuits to
$ATLAS_HOME/circuit/NWQBench
and replace the SWAP gates with logical qubit swaps because some previous work does not support SWAP gates, and this replacement does not affect the result:
(You may need to edit $ATLAS_HOME/deps/quartz/src/test/test_remove_swap.cpp
as needed if you use other circuits that
are not in this repository.)
cd deps/quartz/build
make test_remove_swap
./test_remove_swap
This is done on a single thread of an Intel(R) Xeon(R) W-1350 @ 3.30GHz CPU.
- Follow the instruction at the beginning of this document to build Quartz and copy the circuits. Please also install the HiGHS solver in Quartz:
# in quartz conda environment
cd deps/quartz/external/HiGHS
mkdir build
cd build
cmake ..
make -j 12
cd ../../../../..
- In the Quartz Python environment, install PuLP (with HiGHS solver):
# in quartz conda environment
pip install git+https://github.com/coin-or/[email protected]
- Run preprocessing for 28 local qubits:
export ATLAS_HOME=${The_directory_running_git_clone}/atlas-artifact # if not already set
export PATH=$PATH:$ATLAS_HOME/deps/quartz/external/HiGHS/build/bin # if not already set
cd $ATLAS_HOME
cd perlmutter/e2e
bash preprocess.sh # takes around 17 minutes
- For different numbers of local qubits, please change
local_values=(28)
inperlmutter/e2e/preprocess.sh
accordingly. For best result, please also adjust the numbers in thekernel_cost
object indeps/quartz/src/test/test_simulation.cpp
according to the benchmark results of 1-to-7-qubit fusion kernels and shared-memory kernels.