Welcome to the repository that showcases advanced neural architecture discovery and optimization solutions from Intel Labs. Here, you'll find cutting-edge research papers and their corresponding code implementations, all aimed at pushing the boundaries of model efficiency and performance.
Authors: J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain
Links: Paper | Code
MultiPruner is a training-free pruning approach for large pre-trained models that iteratively compresses residual blocks, MLP channels, and MHA heads, achieving superior zero-shot accuracy and model compression.
Authors: J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain
Conference: EMNLP 2024 Findings
Links: Paper | Code
SQFT fine-tunes sparse and low-precision LLMs using parameter-efficient techniques, merging sparse weights with low-rank adapters while maintaining sparsity and accuracy, and handling quantized weights and adapters of different precisions.
SparAMX utilizes AMX support on the latest Intel CPUs along with unstructured sparsity to achieve a reduction in end-to-end latency compared to the current PyTorch implementation by applying our technique in linear layers.
Authors: Ahmed F. AbouElhamayed, Jordan Dotzel, Yash Akhauri, Chi-Chih Chang, Sameh Gobriel, J. Pablo Munoz, Vui Seng Chua, Nilesh Jain, Mohamed S. Abdelfattah
Links: Paper | Code
Authors: J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain
Conference: NAACL 2024 (Industry Track)
Links: Paper | Code
Shears integrates cost-effective sparsity and Neural Low-rank adapter Search (NLS) to further improve the efficiency of Parameter-Efficient Fine-Tuning (PEFT) approaches.
Authors: J. Pablo Muñoz, Jinjie Yuan, Yi Zheng, Nilesh Jain
Conference: LREC-COLING 2024
Links: Paper | Code
LoNAS explores weight-sharing NAS for compressing large language models using elastic low-rank adapters, achieving high-performing models balancing efficiency and performance.
Authors: J. Pablo Muñoz, Yi Zheng, Nilesh Jain
Conference: LREC-COLING 2024
Links: Paper | Code
EFTNAS integrates neural architecture search and network pruning to automatically generate and train efficient, high-performing, and compressed transformer-based models for NLP tasks.
Authors: Yash Akhauri, J. Pablo Muñoz, Nilesh Jain, Ravi Iyer
Conference: NeurIPS 2022
Links: Paper | Code
EZNAS is a genetic programming-driven methodology for automatically discovering Zero-Cost Neural Architecture Scoring Metrics (ZC-NASMs).
Authors: J. Pablo Muñoz, Nikolay Lyalyushkin, Yash Akhauri, Anastasia Senina, Alexander Kozlov, Chaunte Lacewell, Daniel Cummings, Anthony Sarah, Nilesh Jain
Conferences: AutoML 2022 (Main Track), AAAI 2022 (Practical Deep Learning in the Wild)
Links: Paper AutoML Paper AAAI| Code
BootstrapNAS generates weight-sharing super-networks from pre-trained models, and discovers efficient subnetworks.
Explore practical examples and notebooks related to NNCF's BootstrapNAS, a tool designed to facilitate neural architecture search and optimization.
Links: Code
We hope you find these resources valuable for your research and development in the field of model optimization. Happy exploring! 🌟