Name		Name	Last commit message	Last commit date
parent directory ..
resources		resources
LICENSE		LICENSE
README.md		README.md
convselfattn.py		convselfattn.py
implement_CSA.py		implement_CSA.py

README.md

Convolutional Self-Attention (CSA)

Emulating attention mechanism in transformer models with a fully convolutional network

John Yang, Le An, Su Inn Park

Unlike other convolutional models that try to ingest the attention module from transformer model, Convolutional Self-Attention (CSA) explicitly finds relationships among features one-to-many with only convolutions in conjunction with simple tensor shape manipulations. As results, CSA operates without bells and hassles in TensorRT’s restricted mode, making it suitable for AV production for safety-critical applications.

Usage

We employ the same setup as that in ConvNeXt repository for general usages including training/testing. For details on environment preparation, data download, and training/evaluation scripts, please refer to the original repo for details.

Setting up CSA

For setting up, firstly git clone ConvNeXt repository and that of ours.

git clone https://github.com/facebookresearch/ConvNeXt.git
git clone https://github.com/NVIDIA/DL4AGX.git

In order to place and set-up Conv-Self-Attention model files within the ConvNeXt implementation, the following commands set up files in the appropriate locations for the training/testing commands to be run.

cp your/path/to/DL4AGX/ConvSelfAttention/convselfattn.py your/path/to/ConvNeXt/models
cp your/path/to/DL4AGX/ConvSelfAttention/implement_CSA.py your/path/to/ConvNeXt/
cd ConvNeXt

Once copying required files in the ConvNeXt repository, make sure the files in this git are located in the following directories of ConvNeXt:

ConvNeXt
 ├ models
 │ ...
 │ └ convselfattn.py
 ├ object_detection
 ├ semantic_segmentation
 │ ...
 └ implement_CSA.py

Then, run the following command in order for main.py to include the newly implemented files:

python implement_CSA.py

Training

For training the network with CSA modules, we established the distributed training via bcprun with NVIDIA Base Command Platform.

The training was dones with 2 8-GPU nodes updating every 2 epochs to follow the original training criterion of ConvNeXt's batch_size=4096.

bcprun --nnodes 2 --npernode 8 --cmd 'python main.py --model convselfattn --drop_path 0.1 \
--lr 4e-3 --batch_size 128 --update_freq 2 --use_amp True --model_ema true \
--model_ema_eval false --data_path your/path/to/dataset/ \
--output_dir /results --sync_BN True --warmup_epochs 20 --epochs 300'

Testing

Once the training is done, we provide an example evaluation command for a ImageNet-1K pre-trained CSA Network:

python main.py --model convselfattn --eval true --resume your/path/to/trained_model.pth

Our CSA network should give Top-1 accuracy of 81.30% for FP32 inferences if trained properly with the training command above.

License

The provided code can be used for research or other non-commercial purposes. For details please check the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConvSelfAttention

ConvSelfAttention

README.md

Convolutional Self-Attention (CSA)

Emulating attention mechanism in transformer models with a fully convolutional network

Usage

Setting up CSA

Training

Testing

License

Files

ConvSelfAttention

Directory actions

More options

Directory actions

More options

Latest commit

History

ConvSelfAttention

Folders and files

parent directory

README.md

Convolutional Self-Attention (CSA)

Emulating attention mechanism in transformer models with a fully convolutional network

Usage

Setting up CSA

Training

Testing

License