Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen MoE gives strange output #481

Open
khoangothe opened this issue Jan 5, 2025 · 0 comments
Open

Qwen MoE gives strange output #481

khoangothe opened this issue Jan 5, 2025 · 0 comments

Comments

@khoangothe
Copy link

Hi Team, thank you for the awesome repo, I have had a lot of success merging models. However when I was trying to merge Qwen using MoE with the following configuration, the output is broken:
Here is the config that I used:

base_model: Qwen/Qwen2.5-7B-Instruct
architecture: qwen
gate_mode: hidden # one of "hidden", "cheap_embed", or "random"
dtype: float32 # output dtype (float32, float16, or bfloat16)
experts:
  - source_model: Qwen/Qwen2.5-Coder-7B-Instruct
    positive_prompts:
      - "code"
      - "python"
      - "javascript"
      - "programming"
      - "algorithm"
  - source_model: Qwen/Qwen2.5-Math-7B-Instruct
    positive_prompts:
      - "reason"
      - "math"
      - "mathematics"
      - "solve"
      - "count"
shared_experts:
  - source_model: Qwen/Qwen2.5-7B-Instruct
    positive_prompts: # required by Qwen MoE for "hidden" gate mode, otherwise not allowed
      - "chat"
      - "assistant"
      - "fact"
    # (optional, but recommended:)
    residual_scale: 0.1 # downweight output from shared expert to prevent overcooking the model

Wonder if anybody has had any success merging MoE with Qwen2.5 so I could have a reference? Thank you so much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant