-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance (reproducible) Issue #1058
Comments
Hi @Thunderbeee, The models in your table do not match the model you linked to (your table references Achieving accuracy recovery after intensive compression (both sparsification and quantization) for small models (<8B) is often very difficult. 1B and 2B models are already extremely parameter efficient, so compressing these models will affect accuracy more than compressing larger models, which typically have less parameter efficiency and therefore more capability for compression. |
If yours is a TinyML application and requires small models, I would recommend focusing on fine tuning your model to your specific task/domain and evaluating on that task. I would not expect compressed (especially fine-tuned) 1B models to perform well on generalized tasks such as MMLU. |
Thank you kylesayrs! I get it. I found sparsity + distillation may achieve a better result. Hence, I follow the example in trl_mixin/ex_trl_distillation.py, and change recipe to
but it could not perform SparseGPTModifier. Could you please provide an instruction how to perform SparseGPT + Distillation? and explain why ConstantPruningModifier works but SparseGPTModifier does not? Thanks so much! |
Hey @Thunderbeee! Can you explain more about what you mean when you say For more information on the |
Thanks Kylesayrs! SparseGPTModifier could not be performed means it was skipped, and started KD directly. I check the code, I think it is because
llmcompressor/modifiers/modifier.py:107. Therefore, I change the recipe to recipe = """ but get issue:
|
Hi the team! Thanks so much for your works. I follow the instruction provided in #1028, but the final accuracy is not desirable. I use ultrachat as the dataset, and test on MMLU.
However, from Huggingface https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4, we can found the Accuracy Recovery is extremely high. I am wondering how to reproduce the result you claimed officially. Thanks!
The text was updated successfully, but these errors were encountered: