[Misc] Add Gemma2 GGUF support #12186

Isotr0py · 2025-01-18T14:16:43Z

FIX #12000 (link existing issues this PR will resolve)

Signed-off-by: Isotr0py <[email protected]>

github-actions · 2025-01-18T14:16:53Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

summersonnn · 2025-01-21T13:02:27Z

@Isotr0py Hi. I couldn't wait until the merge and gave it a try. Got this:

File "/persistent/virtualenvs/llm/lib/python3.10/site-packages/torch/_tensor.py", line 983, in split
    return torch._VF.split_with_sizes(self, split_size, dim)
RuntimeError: split_with_sizes expects split_sizes to sum exactly to 8192 (input tensor's size at dimension -1), but got split_sizes=[8192, 4096, 4096]

during serving gemma-2-27b-it-Q4_K_M. any idea? I can post the full trace if you need.

vllm==0.6.6.post1 (+ your changes)
transformers==4.48.1
torch==2.5.1

Signed-off-by: Isotr0py <[email protected]>

Isotr0py · 2025-01-21T13:51:52Z

I see, seems that it's because of the incorrect head_dim due to gguf -> hf config conversion from transformers. The correct head_dim should be 128, while the value extracted from gguf is the default 256...

add gemma2 gguf support

3407e8d

Signed-off-by: Isotr0py <[email protected]>

Isotr0py added 2 commits January 21, 2025 21:30

Merge branch 'vllm-project:main' into gemma2-gguf

feb08df

gemma2 weight conversion

0b03450

Signed-off-by: Isotr0py <[email protected]>

Isotr0py mentioned this pull request Jan 21, 2025

Fix head_dim in config extracted from Gemma2 GGUF model huggingface/transformers#35818

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Add Gemma2 GGUF support #12186

[Misc] Add Gemma2 GGUF support #12186

Isotr0py commented Jan 18, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 18, 2025

summersonnn commented Jan 21, 2025 •

edited

Loading

Isotr0py commented Jan 21, 2025 •

edited

Loading

[Misc] Add Gemma2 GGUF support #12186

Are you sure you want to change the base?

[Misc] Add Gemma2 GGUF support #12186

Conversation

Isotr0py commented Jan 18, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 18, 2025

summersonnn commented Jan 21, 2025 • edited Loading

Isotr0py commented Jan 21, 2025 • edited Loading

Isotr0py commented Jan 18, 2025 •

edited by github-actions bot

Loading

summersonnn commented Jan 21, 2025 •

edited

Loading

Isotr0py commented Jan 21, 2025 •

edited

Loading