BERT distillation and quantization in a distributed setting using Fairscale library. The repository contains different versions of the training code on the GLUE task using the mRPC dataset with different levels of parallelism using PyTorch and FairScale's constructs: