Experts per token option missing at model load screen for GGUF (reg and _hff loader) #6607

David-AU-github · 2024-12-26T01:15:00Z

Describe the bug

No option to select "number" of experts to use for MOE models, in GGUF format.

Is there an existing issue for this?

I have searched the existing issues

Reproduction

Attempt to load a MOE GGUF model - either loader -> llamacpp OR llamacpp_HF

Screenshot

No response

Logs

n/a

System Info

Windows 11, Nvidia 4060Ti Vram.
Also reported by other users, attempt to load any of my MOE models (10), from my repo.
(DavidAU at hug..face).

NOTE: Moes are built using NEWEST llamacpp versions, not older llamacpp/may be "broken" moes.

For llama-server.exe I use this hack:

./llama-server -m i:/llm/David_AU/testfiles/Grand-Horror-MOE-4X8-series1-Q4_K_S.gguf -c 4096 -ngl 99 --override-kv llama.expert_used_count=int:4  

Example using PowerShell.

David-AU-github added the bug Something isn't working label Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experts per token option missing at model load screen for GGUF (reg and _hff loader) #6607

Experts per token option missing at model load screen for GGUF (reg and _hff loader) #6607

David-AU-github commented Dec 26, 2024

Experts per token option missing at model load screen for GGUF (reg and _hff loader) #6607

Experts per token option missing at model load screen for GGUF (reg and _hff loader) #6607

Comments

David-AU-github commented Dec 26, 2024

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info