knowledge distillation for keti project
This repository contains implementation of Knowlegde distillation on Cifar10 and ImageNet
all available models are in "cifar10/models"
- Vanilla KD is implemented
- You can change teacher and student pair by changing following line of code
if args.kd:
net = EfficientNetB0()
t_net = ResNeXt29_2x64d()
- you can also specify a model to run CE on your preferred model
else:
# net = SPECIFY YOUR MODEL (REFER TO CODE ON cifar10/models)
- to run code with CE, simply type
bash trainCE.sh
- if you want to run code with KD, simply type
bash trainKD.sh
note that models should be changed manually by hand
For student network, we used EfficientNet-B0 (90.67%) which is then finetuned with KD.
All students' accuracies finetuned with teacher networks were increased and fell in the bound within 1% accuracy drop from that of teachers'.
Teacher | Student | Teacher acc | Student accuracy achieved with KD |
---|---|---|---|
ResNeXt29_32x8d | EfficientNet B0 | 92.66% | 92.20% |
Resnet152 | EfficientNet B0 | 92.41% | 92.12% |
DenseNet40 | EfficientNet B0 | 92.26% | 91.63% |
Teacher | Student | Teacher Size | Student Size | compression ratio |
---|---|---|---|---|
ResNet152 | EfficientNet B0 | 72M | 6.6M | 9.16% |
Since ResNet152 showed promising accuracy and Size Ratio with EfficientNet-B0, it was chosen as the student and teacher pair for ImageNet
Teacher | Student | Teacher Size | Student Size | compression ratio |
---|---|---|---|---|
ResNet152 | EfficientNet B0 | 104.4M | 20.7M | 19.83% |
- to run code, type
bash train.sh
- some options to give in train.sh
--arch : you can manually pick student model (efficientnets, models that are in torchvision.)
--teacher-arch : you can manually pick teacher model. They are all pretrained model. For EfficientNet please refer [here](https://github.com/lukemelas/EfficientNet-PyTorch/), otherwise refer to [torchvision documentation](https://pytorch.org/docs/stable/torchvision/models.html)).
--workers : how many threads are used for multiprocessing of CPU (4~8 suggested)
--kd : to do kd or not (if not, only student will be learned with CE)
--T : temperature parameter used for kd
--w : ratio paramter used for kd
--overhaul : implementation of SOTA for kd. (undergoing)
--save_path "$YOUR FOLDER" : saves models on specified folder
other trivial options are not stated here. Please refer to code
Teacher | Student | Teacher acc | Student acc | Student accuracy achieved with KD |
---|---|---|---|---|
ResNet152 | EfficientNet B0 | 78.31% | 76.3% | 77.07% |