Qwen MoE gives strange output #481

khoangothe · 2025-01-05T14:21:33Z

Hi Team, thank you for the awesome repo, I have had a lot of success merging models. However when I was trying to merge Qwen using MoE with the following configuration, the output is broken:
Here is the config that I used:

base_model: Qwen/Qwen2.5-7B-Instruct
architecture: qwen
gate_mode: hidden # one of "hidden", "cheap_embed", or "random"
dtype: float32 # output dtype (float32, float16, or bfloat16)
experts:
  - source_model: Qwen/Qwen2.5-Coder-7B-Instruct
    positive_prompts:
      - "code"
      - "python"
      - "javascript"
      - "programming"
      - "algorithm"
  - source_model: Qwen/Qwen2.5-Math-7B-Instruct
    positive_prompts:
      - "reason"
      - "math"
      - "mathematics"
      - "solve"
      - "count"
shared_experts:
  - source_model: Qwen/Qwen2.5-7B-Instruct
    positive_prompts: # required by Qwen MoE for "hidden" gate mode, otherwise not allowed
      - "chat"
      - "assistant"
      - "fact"
    # (optional, but recommended:)
    residual_scale: 0.1 # downweight output from shared expert to prevent overcooking the model

Wonder if anybody has had any success merging MoE with Qwen2.5 so I could have a reference? Thank you so much

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen MoE gives strange output #481

Qwen MoE gives strange output #481

khoangothe commented Jan 5, 2025

Qwen MoE gives strange output #481

Qwen MoE gives strange output #481

Comments

khoangothe commented Jan 5, 2025