Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experts per token option missing at model load screen for GGUF (reg and _hff loader) #6607

Open
1 task done
David-AU-github opened this issue Dec 26, 2024 · 0 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@David-AU-github
Copy link

Describe the bug

No option to select "number" of experts to use for MOE models, in GGUF format.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Attempt to load a MOE GGUF model - either loader -> llamacpp OR llamacpp_HF

Screenshot

No response

Logs

n/a

System Info

Windows 11, Nvidia 4060Ti Vram.
Also reported by other users, attempt to load any of my MOE models (10), from my repo.
(DavidAU at hug..face).

NOTE: Moes are built using NEWEST llamacpp versions, not older llamacpp/may be "broken" moes.

For llama-server.exe I use this hack:

./llama-server -m i:/llm/David_AU/testfiles/Grand-Horror-MOE-4X8-series1-Q4_K_S.gguf -c 4096 -ngl 99 --override-kv llama.expert_used_count=int:4  

Example using PowerShell.
@David-AU-github David-AU-github added the bug Something isn't working label Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant