Support Mixtral quantization using INC #188

dudilester · 2024-08-14T13:20:43Z

No description provided.

vllm/hpu/ops.py

# Conflicts: # vllm/hpu/ops.py # vllm/model_executor/layers/quantization/inc.py

# Conflicts: # vllm/hpu/ops.py

vllm/hpu/ops.py

vllm/worker/habana_model_runner.py

szutenberg · 2024-09-03T13:03:06Z

vllm/model_executor/layers/fused_moe/layer.py

@@ -13,9 +13,6 @@
 from vllm.model_executor.utils import set_weight_attrs


@kzawora-intel could you check if these changes are upstreamable?

…'TOR

# Conflicts: # vllm/hpu/ops.py

MrGeva

LGTM :)

jkaniecki · 2024-09-10T07:59:16Z

vllm/worker/habana_model_runner.py

@@ -16,6 +16,7 @@

 import habana_frameworks.torch as htorch
 import torch
+from neural_compressor.torch.quantization import finalize_calibration


This brakes taking fp8 measurements with TP>1 - please revert changes made to this file

Support Mixtral quantization using INC

3e503d0

dudilester force-pushed the dev/dlester/mixtral_main_1 branch from f710cc7 to 3e503d0 Compare August 14, 2024 13:23

dudilester requested review from HolyFalafel, MrGeva, Tiefen-boop, Yantom1, BacharL, nirda7 and ulivne August 14, 2024 13:25

Tiefen-boop reviewed Aug 19, 2024

View reviewed changes

vllm/hpu/ops.py Show resolved Hide resolved

Tiefen-boop added 2 commits August 20, 2024 15:50

Merge branch 'refs/heads/habana_main' into dev/dlester/mixtral_main_1

4028491

# Conflicts: # vllm/hpu/ops.py # vllm/model_executor/layers/quantization/inc.py

Merge branch 'refs/heads/habana_main' into dev/dlester/mixtral_main_1

be7f696

# Conflicts: # vllm/hpu/ops.py

Tiefen-boop force-pushed the dev/dlester/mixtral_main_1 branch from 789cc54 to be7f696 Compare August 20, 2024 14:29

Merge branch 'refs/heads/habana_main' into dev/dlester/mixtral_main_1

f2710c9

Tiefen-boop force-pushed the dev/dlester/mixtral_main_1 branch 3 times, most recently from 53cdd9b to 5b515fa Compare August 29, 2024 09:45

Fix formatting errors

1a5fd1d

Tiefen-boop force-pushed the dev/dlester/mixtral_main_1 branch from 5b515fa to 1a5fd1d Compare August 29, 2024 09:51

kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Aug 29, 2024

Fix HabanaExecutorAsync bug when no driver_worker initialized

ca14579

Tiefen-boop force-pushed the dev/dlester/mixtral_main_1 branch 5 times, most recently from 0dc3495 to 4ea6b7c Compare September 2, 2024 12:56

Remove HabanaModelRunner D'TOR for mixtral run

e7106dc

Tiefen-boop force-pushed the dev/dlester/mixtral_main_1 branch from 4ea6b7c to e7106dc Compare September 2, 2024 12:57

szutenberg requested changes Sep 3, 2024

View reviewed changes

szutenberg requested a review from kzawora-intel September 3, 2024 13:05

Tiefen-boop added 2 commits September 3, 2024 17:48

Revert Removal of call to finalize_calibration in HabanaModelRunner D…

c6b132e

…'TOR

Merge branch 'refs/heads/habana_main' into dev/dlester/mixtral_main_1

83a81dd

# Conflicts: # vllm/hpu/ops.py

MrGeva approved these changes Sep 5, 2024

View reviewed changes

szutenberg approved these changes Sep 5, 2024

View reviewed changes

jkaniecki requested changes Sep 10, 2024

View reviewed changes

dudilester closed this Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Mixtral quantization using INC #188

Support Mixtral quantization using INC #188

dudilester commented Aug 14, 2024

szutenberg Sep 3, 2024

MrGeva left a comment

jkaniecki Sep 10, 2024

		@@ -13,9 +13,6 @@
		from vllm.model_executor.utils import set_weight_attrs

Support Mixtral quantization using INC #188

Support Mixtral quantization using INC #188

Conversation

dudilester commented Aug 14, 2024

szutenberg Sep 3, 2024

Choose a reason for hiding this comment

MrGeva left a comment

Choose a reason for hiding this comment

jkaniecki Sep 10, 2024

Choose a reason for hiding this comment