Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report issues regarding the architecture-agnostic branch. #445

Open
win10ogod opened this issue Oct 24, 2024 · 3 comments
Open

Report issues regarding the architecture-agnostic branch. #445

win10ogod opened this issue Oct 24, 2024 · 3 comments

Comments

@win10ogod
Copy link

When performing passthrough, the model will be severely shrunk. For example, the original model bf16 7GB will have 1G left.

@win10ogod
Copy link
Author

merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 3]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [2, 5]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [4, 7]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [6, 9]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [8, 11]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [10, 13]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [12, 15]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [14, 17]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [16, 19]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [18, 21]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [20, 23]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [22, 25]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [24, 27]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
- sources:
  - layer_range: [26, 28]
    model: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

@cg123
Copy link
Collaborator

cg123 commented Oct 26, 2024

Thanks for giving it a try. The architecture-agnostic branch is work in progress at the moment and it's expected that this sort of merge in particular won't work yet.

Merges that don't use layer slicing or the tokenizer functionality should all work though.

@win10ogod
Copy link
Author

Thanks for giving it a try. The architecture-agnostic branch is work in progress at the moment and it's expected that this sort of merge in particular won't work yet.

Merges that don't use layer slicing or the tokenizer functionality should all work though.

Looking forward to a better version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants