Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle weight aliases #490

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jrruethe
Copy link

@jrruethe jrruethe commented Jan 21, 2025

This is a quick modification to the extract_lora.py script to utilize the aliases from the weight info that is loaded from the architecture JSON.

It is my attempt at a solution to #484 and provides an alternative to the hardcoding found in #483.

I have tested this on the following:

  • Qwen_Qwen2.5-0.5B
  • Qwen_Qwen2.5-1.5B
  • Qwen_Qwen2.5-3B
  • meta-llama_Llama-3.2-1B
  • meta-llama_Llama-3.2-3B
  • Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 (This 8B model used "model.embed_tokens.weight" instead of "lm_head.weight", and this change accounts for the difference via the alias)

Tests:

  • Extraction works, without any "lm_head" errors.
  • Reapplying the extracted lora works.

Once I complete the above tests, I will clear the "draft".

Notes:
I still get messages stating the following:

Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at cognitivecomputations_Dolphin3.0-Qwen2.5-0.5B and are newly initialized: ['embed_tokens.weight', 'layers.0.input_layernorm.weight', 'layers.0.mlp.down_proj.weight' ...

However, after looking into it some more, I believe this message is not an issue. I think it is coming from this line due to essentially creating an empty model? I'm not 100% sure.

Interestingly, I still receive an error extracting the lora from this with specific message of "AttributeError: 'LlamaModel' object has no attribute 'lm_head'" which I find a bit bizarre. I got this message before my change too, so it is unrelated, but I'm a bit surprised this change didn't fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant