You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Team, thank you for the awesome repo, I have had a lot of success merging models. However when I was trying to merge Qwen using MoE with the following configuration, the output is broken:
Here is the config that I used:
base_model: Qwen/Qwen2.5-7B-Instruct
architecture: qwen
gate_mode: hidden # one of "hidden", "cheap_embed", or "random"
dtype: float32 # output dtype (float32, float16, or bfloat16)
experts:
- source_model: Qwen/Qwen2.5-Coder-7B-Instruct
positive_prompts:
- "code"
- "python"
- "javascript"
- "programming"
- "algorithm"
- source_model: Qwen/Qwen2.5-Math-7B-Instruct
positive_prompts:
- "reason"
- "math"
- "mathematics"
- "solve"
- "count"
shared_experts:
- source_model: Qwen/Qwen2.5-7B-Instruct
positive_prompts: # required by Qwen MoE for "hidden" gate mode, otherwise not allowed
- "chat"
- "assistant"
- "fact"
# (optional, but recommended:)
residual_scale: 0.1 # downweight output from shared expert to prevent overcooking the model
Wonder if anybody has had any success merging MoE with Qwen2.5 so I could have a reference? Thank you so much
The text was updated successfully, but these errors were encountered:
Hi Team, thank you for the awesome repo, I have had a lot of success merging models. However when I was trying to merge Qwen using MoE with the following configuration, the output is broken:
Here is the config that I used:
Wonder if anybody has had any success merging MoE with Qwen2.5 so I could have a reference? Thank you so much
The text was updated successfully, but these errors were encountered: