Official Implementation of CaP @ INFOCOM 23

This codebase contains the implementation of [Communication-Aware DNN Pruning] (INFOCOM2023).

Introduction

We propose a Communication-aware Pruning (CaP) algorithm, a novel distributed inference framework for distributing DNN computations across a physical network. Departing from conventional pruning methods, CaP takes the physical network topology into consideration and produces DNNs that are communication-aware, designed for both accurate and fast execution over such a distributed deployment. Our experiments on CIFAR-10 and CIFAR-100, two deep learning benchmark datasets, show that CaP beats state of the art competitors by up to 4% w.r.t. accuracy on benchmarks. On experiments over real-world scenarios, it simultaneously reduces total execution time by 27%--68% at negligible performance decrease (less than 1%).

Environment Setup

Please install either python 3.9.X or 3.10.X and create a virtual environment using the requirements.txt file.

Instructions

We provide a sample bash script to run our method at 0.75 sparsity ratio on CIFAR-10.

To run CaP:

source env.sh
run-cifar10-resnet18.sh

Split Network Emulation

Model inference over a network can be emulated by starting multiple threads on a local or remote machine. For windows users, it is assumed that WSL or another bash emulator is installed. The following is the procedure for making a run:

Train and prune the model
Save the model in the assets/models folder
Create the network setup files (see config/resnet_4_network as example):
i. config-leaf.json -- indicates how to reach (via ip and port) leaf nodes of network for input transmission
ii. ip-map.json -- indicates server (ip and port) that each node monitors for incoming connections
iii. network-graph.json -- defines network graph topology
Update local_network/start_servers.sh and local_network/start_server_helper.bat (windows) or local_network/start_servers_linux.sh (linux) with the following:
i. file paths to network setup files
ii. the python environment activation
iii. terminal/bash emulator (e.g. gnome-terminal, terminator, etc.).
iv. model name
Setup the servers (from directory ./CaP):

  # Windows (with wsl) 
  bash local_network/start_servers.sh 

  # Linux 
  bash local_network/start_servers_linus.sh

Activate python environment
Send inputs:

# Windows 
python -m source.utils.send_start_message [path to config-leaf.json]

# Linux
python -m ./source/utils/send_start_message.py [path to config-leaf.json]

Ouputs will appear in the logs/[dir log out] folder specified in the start servers script. Post processing and visaulization tools are found in sandbox/plot_timing.ipynb

Split Network Inference on Colosseum

Example colosseum run procedure. This assumes full and split model files have been loaded onto the file-proxy server at /share/nas/[team name]/CaP-Models/perm beforehand (TODO: generalize, add detail, and verify works):

Connect to VPN via cisco
Make a reservation. Use the CAP-wifi-v1 container for WiFi nodes and JARVIS-server-cap1 for server nodes [TODO: make and test container for UE and base station nodes]
While waiting for SRN nodes to spin up, modify the bash scripts in the CaP/colosseum folder:
i. Manually configure colosseum/nodes.txt with the correct SRN numbers
ii. In prep_run.sh, update leaf node connection type [WARNING: not tested for heterogeneous networks] and rf scenario (see colosseum documentation for more details)
iii. In ./start_servers_colosseum.sh, select model file, batch size, and specify log output directory name
iv. In ./start_run.sh, uncomment/comment commands based on SRN type
Open bash session in folder CaP/colosseum
Move repo to SRN nodes, start rf, collect ip addresses, and build json config:

bash ./prep_run.sh

Start servers on SRN nodes for split model execution (NOTE: resnet101 models take 1-2 minutes to load):

bash ./start_servers_colosseum.sh

Send starting message to nodes:

bash ./start_run.sh [srn #]

Kill servers (can also be used to kill RF scenario, see script comments for details)

bash ./kill_servers.sh

Update the log file name in CaP/colosseum/start_servers_colosseum.sh (separate run outputs) and repeat steps 9-11 for next run until finished with all runs
Inspect logging messages saved to colosseum's file-proxy server /share/nas/[team name] after end of reservation

Cite

@article{jian2023cap,
  title={Communication-Aware DNN Pruning},
  author={Jian, Tong and Roy, Debashri Roy and Salehi, Batool and Soltani, Nasim and Chowdhury, Kaushik and Ioannidis, Stratis}
  journal={INFOCOM},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
colosseum		colosseum
config		config
local_network		local_network
sandbox		sandbox
source		source
.gitignore		.gitignore
.gitmodules		.gitmodules
Demo.ipynb		Demo.ipynb
README.md		README.md
env.sh		env.sh
environment.yml		environment.yml
intro.png		intro.png
old_run.sh		old_run.sh
plot_model_visual.py		plot_model_visual.py
reqs_cap39.txt		reqs_cap39.txt
requirements.txt		requirements.txt
run_cifar10.sh		run_cifar10.sh
run_cifar100.sh		run_cifar100.sh
run_esc.sh		run_esc.sh
run_split_model.py		run_split_model.py
system_submit.py		system_submit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Official Implementation of CaP @ INFOCOM 23

Introduction

Environment Setup

Instructions

Split Network Emulation

Split Network Inference on Colosseum

Cite

About

Releases

Packages

Languages

neu-spiral/CaP2

Folders and files

Latest commit

History

Repository files navigation

Official Implementation of CaP @ INFOCOM 23

Introduction

Environment Setup

Instructions

Split Network Emulation

Split Network Inference on Colosseum

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages