Model Optimization Research 🚀

Welcome to the repository that showcases advanced neural architecture discovery and optimization solutions from Intel Labs. Here, you'll find cutting-edge research papers and their corresponding code implementations, all aimed at pushing the boundaries of model efficiency and performance.

Featured Research Papers 📚

Fine-Grained Training-Free Structure Removal in Foundation Models

Authors: J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain
Links: Paper | Code

MultiPruner is a training-free pruning approach for large pre-trained models that iteratively compresses residual blocks, MLP channels, and MHA heads, achieving superior zero-shot accuracy and model compression.

SQFT: Low-cost Model Adaptation in Low-precision Sparse Foundation Models

Authors: J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain
Conference: EMNLP 2024 Findings
Links: Paper | Code

SQFT fine-tunes sparse and low-precision LLMs using parameter-efficient techniques, merging sparse weights with low-rank adapters while maintaining sparsity and accuracy, and handling quantized weights and adapters of different precisions.

SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs

SparAMX utilizes AMX support on the latest Intel CPUs along with unstructured sparsity to achieve a reduction in end-to-end latency compared to the current PyTorch implementation by applying our technique in linear layers.

Authors: Ahmed F. AbouElhamayed, Jordan Dotzel, Yash Akhauri, Chi-Chih Chang, Sameh Gobriel, J. Pablo Munoz, Vui Seng Chua, Nilesh Jain, Mohamed S. Abdelfattah
Links: Paper | Code

Shears: Unstructured Sparsity with Neural Low-rank Adapter Search

Authors: J. Pablo Muñoz, Jinjie Yuan, Nilesh Jain
Conference: NAACL 2024 (Industry Track)
Links: Paper | Code

Shears integrates cost-effective sparsity and Neural Low-rank adapter Search (NLS) to further improve the efficiency of Parameter-Efficient Fine-Tuning (PEFT) approaches.

LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models

Authors: J. Pablo Muñoz, Jinjie Yuan, Yi Zheng, Nilesh Jain
Conference: LREC-COLING 2024
Links: Paper | Code

LoNAS explores weight-sharing NAS for compressing large language models using elastic low-rank adapters, achieving high-performing models balancing efficiency and performance.

EFTNAS: Searching for Efficient Language Models in First-Order Weight-Reordered Super-Networks

Authors: J. Pablo Muñoz, Yi Zheng, Nilesh Jain
Conference: LREC-COLING 2024
Links: Paper | Code

EFTNAS integrates neural architecture search and network pruning to automatically generate and train efficient, high-performing, and compressed transformer-based models for NLP tasks.

EZNAS: Evolving Zero Cost Proxies For Neural Architecture Scoring

Authors: Yash Akhauri, J. Pablo Muñoz, Nilesh Jain, Ravi Iyer
Conference: NeurIPS 2022
Links: Paper | Code

EZNAS is a genetic programming-driven methodology for automatically discovering Zero-Cost Neural Architecture Scoring Metrics (ZC-NASMs).

BootstrapNAS: Enabling NAS with Automated Super-Network Generation

Authors: J. Pablo Muñoz, Nikolay Lyalyushkin, Yash Akhauri, Anastasia Senina, Alexander Kozlov, Chaunte Lacewell, Daniel Cummings, Anthony Sarah, Nilesh Jain
Conferences: AutoML 2022 (Main Track), AAAI 2022 (Practical Deep Learning in the Wild)
Links: Paper AutoML Paper AAAI| Code

BootstrapNAS generates weight-sharing super-networks from pre-trained models, and discovers efficient subnetworks.

Additional Resources 📂

NNCF's BootstrapNAS - Notebooks and Examples

Explore practical examples and notebooks related to NNCF's BootstrapNAS, a tool designed to facilitate neural architecture search and optimization.
Links: Code

We hope you find these resources valuable for your research and development in the field of model optimization. Happy exploring! 🌟

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Optimization Research 🚀

Featured Research Papers 📚

Fine-Grained Training-Free Structure Removal in Foundation Models

SQFT: Low-cost Model Adaptation in Low-precision Sparse Foundation Models

SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs

Shears: Unstructured Sparsity with Neural Low-rank Adapter Search

LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models

EFTNAS: Searching for Efficient Language Models in First-Order Weight-Reordered Super-Networks

EZNAS: Evolving Zero Cost Proxies For Neural Architecture Scoring

BootstrapNAS: Enabling NAS with Automated Super-Network Generation

Additional Resources 📂

NNCF's BootstrapNAS - Notebooks and Examples

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
BootstrapNAS		BootstrapNAS
EFTNAS		EFTNAS
EZNAS		EZNAS
LoNAS		LoNAS
MultiPruner		MultiPruner
SQFT		SQFT
Shears		Shears
SparAMX		SparAMX
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
security.md		security.md

License

IntelLabs/Hardware-Aware-Automated-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Model Optimization Research 🚀

Featured Research Papers 📚

Fine-Grained Training-Free Structure Removal in Foundation Models

SQFT: Low-cost Model Adaptation in Low-precision Sparse Foundation Models

SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs

Shears: Unstructured Sparsity with Neural Low-rank Adapter Search

LoNAS: Elastic Low-Rank Adapters for Efficient Large Language Models

EFTNAS: Searching for Efficient Language Models in First-Order Weight-Reordered Super-Networks

EZNAS: Evolving Zero Cost Proxies For Neural Architecture Scoring

BootstrapNAS: Enabling NAS with Automated Super-Network Generation

Additional Resources 📂

NNCF's BootstrapNAS - Notebooks and Examples

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages