-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Mixtral quantization using INC #188
Conversation
f710cc7
to
3e503d0
Compare
# Conflicts: # vllm/hpu/ops.py # vllm/model_executor/layers/quantization/inc.py
# Conflicts: # vllm/hpu/ops.py
789cc54
to
be7f696
Compare
53cdd9b
to
5b515fa
Compare
5b515fa
to
1a5fd1d
Compare
0dc3495
to
4ea6b7c
Compare
4ea6b7c
to
e7106dc
Compare
@@ -13,9 +13,6 @@ | |||
from vllm.model_executor.utils import set_weight_attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kzawora-intel could you check if these changes are upstreamable?
# Conflicts: # vllm/hpu/ops.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
@@ -16,6 +16,7 @@ | |||
|
|||
import habana_frameworks.torch as htorch | |||
import torch | |||
from neural_compressor.torch.quantization import finalize_calibration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This brakes taking fp8 measurements with TP>1 - please revert changes made to this file
No description provided.