Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs

We use the Perlmutter supercomputer to evaluate the performance of Atlas, and use a single-core CPU (can be a laptop) to evaluate the algorithms Stage and Kernelize.

Atlas is built on Quartz. To run the artifact, please create a Python environment for Quartz and copy the necessary circuits first. This needs to be done on both the single-core CPU and Perlmutter.

Prerequisites

Conda 23.3 or later (module load conda on Perlmutter)
CMake 3.18 or later

The following parts will assume that these commands are executed (please replace YOUR_ACCOUNT with your account name):

# clone this repo
git clone [email protected]:quantum-compiler/atlas-artifact.git --recursive

# Create Python environment
cd atlas-artifact/deps/quartz
conda env create --name quartz --file env.yml
conda activate quartz
pip install matplotlib

# Build Quartz
mkdir build
cd build
cmake ..

# Install the HiGHS solver
cd ../external/HiGHS
mkdir build
cd build
cmake ..
make -j 12
cd ../../../..
export ATLAS_HOME=${The_directory_running_git_clone}/atlas-artifact
export PATH=$PATH:$ATLAS_HOME/deps/quartz/external/HiGHS/build/bin

# The above needs to be done on both the single-core CPU and Perlmutter.
# The following only needs to be done on Perlmutter.

# Replace the account name (not necessary if you are only viewing the results and not reproducing any experiments)
cd ../perlmutter/e2e
python replace_account_name.py
YOUR_ACCOUNT
# (input your account name above)
cd ../..

Circuit Staging

To plot the existing results in Figure 8:

# in quartz conda environment
cd staging_bench
python ilp_plot.py
cd ..

To run the experiment and reproduce the results (takes ~14 hours on a single-core CPU):

# in quartz conda environment
cd deps/quartz/build
# make test_remove_swap
# ./test_remove_swap
make benchmark_ilp_num_stages
cd ..
# yes | cp -r ../../circuit/* circuit/
./build/benchmark_ilp_num_stages
cp ilp_result.csv ../../staging_bench
cd ../..

Circuit Kernelization

To plot the existing results in Figure 9:

# in quartz conda environment
cd kernelization_bench
python dp_plot.py
cd ..

To run the experiment and reproduce the results (takes ~13 hours on a single-core CPU):

# in quartz conda environment
cd deps/quartz/build
make benchmark_dp
./benchmark_dp
cd ..
cp dp_result.csv ../../kernelization_bench
cd ../..

End-to-end experiments

We run the end-to-end experiments on Perlmutter.

To plot the existing results in Figures 5 and 10:

# in quartz conda environment
cd perlmutter/e2e/logs
python plot.py
cd ../../..

Following are the instructions to run the experiment and reproduce the results.

Atlas

Set related environment variables in config/config.linux (please download cuQuantum if needed). We support two modes for simulation in Atlas. First one is distributed GPU-based simulation (USE_LEGION=OFF). The other one is CPU-offload enabled simulation (USE_LEGION=ON), which support simulating more qubits on a single machine. Note that the second mode has not been tested for multi-node execution.

In this section (end-to-end experiments), please make sure that the distributed GPU-based simulation is used (USE_LEGION=OFF).

In addition, please make sure that the HiGHS solver is installed, the environment variable ATLAS_HOME is set, and PATH is updated to $PATH:$ATLAS_HOME/deps/quartz/external/HiGHS/build/bin.

Create a Python 3.8 environment with PuLP (with HiGHS solver), pybind11, and Qiskit (Qiskit is not necessary for the end-to-end experiments but necessary for the DRAM offloading experiments):

conda create --name pulp python=3.8
conda activate pulp
pip install git+https://github.com/coin-or/[email protected]
pip install qiskit pybind11

Build and install:

# in pulp conda environment
mkdir build
cd build
bash ../config/config.linux
make -j 12
cd ..

Run the sbatch scripts:

# in pulp conda environment
cd perlmutter/e2e
sbatch srun-1-quartz.sh  # takes around 2 minutes (in background)
sbatch srun-2-quartz.sh  # takes around 1 minute
sbatch srun-4-quartz.sh  # takes around 2 minutes
sbatch srun-8-quartz.sh  # takes around 2 minutes
sbatch srun-16-quartz.sh  # takes around 2 minutes
# Following are additional experiments
# sbatch srun-32-quartz.sh  # takes around 2 minutes
# sbatch srun-64-quartz.sh  # takes around 3 minutes

Troubleshooting

Please do not build Atlas with different options (USE_LEGION=OFF/USE_LEGION=ON) when the jobs are submitted but still pending. If you submit the jobs with USE_LEGION=OFF and then build Atlas with USE_LEGION=ON (or vice versa), overwriting the previous executable, for example, the jobs with USE_LEGION=OFF may not run properly.
In some rare case (observed just before some maintenance of Perlmutter), the communication time on Perlmutter may take as long as 30 seconds while it should take less than one second. If you observe an unreasonably long running time, please simply run the script again.

HyQuas

Download our modified HyQuas from the perlmutter branch of the Repo:

cd $ATLAS_HOME
cd ..
# It is recommended to let HyQuas and atlas-artifact share the same parent directory.
git clone -b perlmutter https://github.com/caoshiyi/HyQuas --recursive

Config Env

module load cudatoolkit/11.7
module load nccl/2.15.5
module load gcc/11.2.0
module load cray-mpich/8.1.25
export HYQUAS_ROOT=${The_directory_running_git_clone}/HyQuas
cd ${HYQUAS_ROOT}
bash ./update_cutt_makefile.sh

Compile the cutt lib:

cd ${HYQUAS_ROOT}/third-party/cutt
make -j

To reproduce the results we displayed in the paper, build HyQuas using:

cd ${HYQUAS_ROOT}/scripts
source ../scripts/init.sh -DBACKEND=mix -DSHOW_SUMMARY=on -DSHOW_SCHEDULE=off -DMICRO_BENCH=on -DUSE_DOUBLE=on -DDISABLE_ASSERT=off -DENABLE_OVERLAP=on -DMEASURE_STAGE=off -DEVALUATOR_PREPROCESS=on -DUSE_MPI=on -DMAT=7

Use the scripts *-hyquas.sh under atlas-artifact/perlmutter/e2e to run the experiments on different number of GPUs.

# assume HyQuas and atlas-artifact share the same parent directory
cd ../../atlas-artifact/perlmutter/e2e
sbatch srun-1-hyquas.sh  # takes around 3 minutes (in background)
sbatch srun-2-hyquas.sh  # takes around 2 minutes
sbatch srun-4-hyquas.sh  # takes around 2 minutes
sbatch srun-8-hyquas.sh  # takes around 3 minutes
sbatch srun-16-hyquas.sh  # takes around 3 minutes
# Following are additional experiments
# sbatch srun-32-hyquas.sh  # takes around 4 minutes
# sbatch srun-64-hyquas.sh  # takes around 10 minutes

cuQuantum

Make sure that the account name in perlmutter/e2e/cuQuantum.sh is replaced when running the script at the beginning of this document.
Run:

# conda environment is not necessary
# cd perlmutter/e2e
bash cuQuantum.sh 1 1 28  # takes around 1 minute (in foreground)
bash cuQuantum.sh 1 2 29  # takes around 1 minute
bash cuQuantum.sh 1 4 30  # takes around 2 minutes
bash cuQuantum.sh 2 4 31  # takes around 2 minutes
bash cuQuantum.sh 4 4 32  # takes around 2 minutes
bash cuQuantum.sh 8 4 33  # takes around 2 minutes
bash cuQuantum.sh 16 4 34 # takes around 2 minutes
# Following are additional experiments
# bash cuQuantum.sh 32 4 35 # takes around 2 minutes
# bash cuQuantum.sh 64 4 36 # takes around 4 minutes

Qiskit

Make sure that the account name in perlmutter/e2e/Qiskit.sh is replaced when running the script at the beginning of this document.
Run:

# in quartz conda environment
# cd perlmutter/e2e
bash Qiskit.sh 1 1 28  # takes around 3 minutes (in foreground)
bash Qiskit.sh 1 2 29  # takes around 12 minutes
bash Qiskit.sh 1 4 30  # takes around 47 minutes, recommended to allocate a node first

DRAM Offloading

We run the DRAM offloading experiments on Perlmutter.

To plot the existing results in Figures 6 and 7:

# in quartz conda environment
cd $ATLAS_HOME
cd perlmutter/offload
python plot_offload.py
cd ../..

Following are the instructions to run the experiment and reproduce the results.

Atlas

Set related environment variables in config/config.linux (setting USE_LEGION=ON), and use a Python 3.8 environment with PuLP and Qiskit.
Make sure the setenv("PYTHONPATH", ...) in examples/legion-based/test_sim_legion.cc is pointing to the correct location.
Build:

cd build
bash ../config/config.linux
make -j 12
cd ../perlmutter/offload

Either run in interactive mode (please replace YOUR_ACCOUNT with your account name):

salloc --nodes 1 -q regular --time 00:30:00 --constraint gpu --gpus-per-node 4 --account=YOUR_ACCOUNT
conda activate pulp && time bash offload.sh && exit  # takes around 22 minutes
cd ../..

or run in sbatch:

sbatch srun_offload.sh # takes around 22 minutes in background
cd ../..

QDAO

Download and build QDAO v0.1.0 (assuming qdao/ and atlas-artifact/ share the same parent directory):

cd ..
# at the parent directory of atlas-artifact now
git clone https://github.com/Zhaoyilunnn/qdao.git
cd qdao
git checkout tags/v0.1.0
conda create --name qdao python=3.9
conda activate qdao
pip install .

Comment out this line in the QDAO directory you just downloaded, and append these two lines after this line:

        self._sim.set_options(blocking_enable=True)
        self._sim.set_options(blocking_qubits=28)

Copy the scripts to QDAO directory and run in interactive mode (please replace YOUR_ACCOUNT with your account name):

cp ../atlas-artifact/perlmutter/offload/run_qdao.py .
cp ../atlas-artifact/perlmutter/offload/run_qdao.sh .
salloc --nodes 1 -q regular --time 01:20:00 --constraint gpu --gpus-per-node 4 --account=YOUR_ACCOUNT
conda activate qdao
LD_LIBRARY_PATH="" time bash run_qdao.sh 0 && exit  # takes around 60 minutes

The results are stored in qdao/logs.

Copy the results back:

cp logs/* ../atlas-artifact/perlmutter/offload/logs/qdao-qiskit

How to generate and preprocess the circuits used in evaluation (optional)

We include all circuits used in evaluation in this repository so there is no need to generate them again. These instructions are only for your information.

Generating

MQT Bench:

Download the circuits from https://www.cda.cit.tum.de/mqtbench/. Choose scalable benchmarks and "Target-independent Level"->"Qiskit".
Copy the circuits to $ATLAS_HOME/circuit/MQTBench_${number_of_qubits}q and replace the SWAP gates with logical qubit swaps because some previous work does not support SWAP gates, and this replacement does not affect the result:

(You may need to edit $ATLAS_HOME/deps/quartz/src/test/test_remove_swap.cpp as needed if you use other circuits that are not in this repository.)

cd deps/quartz/build
make test_remove_swap
./test_remove_swap

NWQBench:

Clone the repository:

git clone [email protected]:pnnl/nwqbench.git
cd nwqbench
git checkout 3c322b789f5a26636d368253817c8d3f4676ae52  # optional

Manually remove the line "pip==21.1.2" from requirements.txt (optional) and then follow the instructions of NWQBench to install it (i.e., pip install -r requirements.txt).
Run the following commands to generate the qsvm circuits:

cd NWQ_Bench
cd qsvm
python qsvm_raw.py 28
python qsvm_raw.py 29
# The 30-qubit circuit already exists
python qsvm_raw.py 31
python qsvm_raw.py 32
python qsvm_raw.py 33
python qsvm_raw.py 34
python qsvm_raw.py 35
python qsvm_raw.py 36
python qsvm_raw.py 37
python qsvm_raw.py 38
python qsvm_raw.py 42

Similar for ising and ising.py, vqc and vqc_raw.py.

The result circuits are in the folder (path/to/nwqbench)/NWQ_Bench/qsvm/qasm/ (and similar for others).
Copy the circuits to $ATLAS_HOME/circuit/NWQBench and replace the SWAP gates with logical qubit swaps because some previous work does not support SWAP gates, and this replacement does not affect the result:

(You may need to edit $ATLAS_HOME/deps/quartz/src/test/test_remove_swap.cpp as needed if you use other circuits that are not in this repository.)

cd deps/quartz/build
make test_remove_swap
./test_remove_swap

Preprocessing

This is done on a single thread of an Intel(R) Xeon(R) W-1350 @ 3.30GHz CPU.

Follow the instruction at the beginning of this document to build Quartz and copy the circuits. Please also install the HiGHS solver in Quartz:

# in quartz conda environment
cd deps/quartz/external/HiGHS
mkdir build
cd build
cmake ..
make -j 12
cd ../../../../..

In the Quartz Python environment, install PuLP (with HiGHS solver):

# in quartz conda environment
pip install git+https://github.com/coin-or/[email protected]

Run preprocessing for 28 local qubits:

export ATLAS_HOME=${The_directory_running_git_clone}/atlas-artifact  # if not already set
export PATH=$PATH:$ATLAS_HOME/deps/quartz/external/HiGHS/build/bin  # if not already set
cd $ATLAS_HOME
cd perlmutter/e2e
bash preprocess.sh  # takes around 17 minutes

For different numbers of local qubits, please change local_values=(28) in perlmutter/e2e/preprocess.sh accordingly. For best result, please also adjust the numbers in the kernel_cost object in deps/quartz/src/test/test_simulation.cpp according to the benchmark results of 1-to-7-qubit fusion kernels and shared-memory kernels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs

Prerequisites

Circuit Staging

Circuit Kernelization

End-to-end experiments

Atlas

Troubleshooting

HyQuas

cuQuantum

Qiskit

DRAM Offloading

Atlas

QDAO

How to generate and preprocess the circuits used in evaluation (optional)

Generating

Preprocessing

About

Releases 3

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
circuit		circuit
cmake		cmake
config		config
deps		deps
examples		examples
include		include
kernelization_bench		kernelization_bench
perlmutter		perlmutter
schedules		schedules
src		src
staging_bench		staging_bench
table_scripts		table_scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md

quantum-compiler/atlas-artifact

Folders and files

Latest commit

History

Repository files navigation

Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs

Prerequisites

Circuit Staging

Circuit Kernelization

End-to-end experiments

Atlas

Troubleshooting

HyQuas

cuQuantum

Qiskit

DRAM Offloading

Atlas

QDAO

How to generate and preprocess the circuits used in evaluation (optional)

Generating

Preprocessing

About

Resources

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages