Handle weight aliases #490

jrruethe · 2025-01-21T22:31:56Z

This is a quick modification to the extract_lora.py script to utilize the aliases from the weight info that is loaded from the architecture JSON.

It is my attempt at a solution to #484 and provides an alternative to the hardcoding found in #483.

I have tested this on the following:

Qwen_Qwen2.5-0.5B
Qwen_Qwen2.5-1.5B
Qwen_Qwen2.5-3B
meta-llama_Llama-3.2-1B
meta-llama_Llama-3.2-3B
Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 (This 8B model used "model.embed_tokens.weight" instead of "lm_head.weight", and this change accounts for the difference via the alias)

Tests:

Extraction works, without any "lm_head" errors.
Reapplying the extracted lora works.

Once I complete the above tests, I will clear the "draft".

Notes:
I still get messages stating the following:

Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at cognitivecomputations_Dolphin3.0-Qwen2.5-0.5B and are newly initialized: ['embed_tokens.weight', 'layers.0.input_layernorm.weight', 'layers.0.mlp.down_proj.weight' ...

However, after looking into it some more, I believe this message is not an issue. I think it is coming from this line due to essentially creating an empty model? I'm not 100% sure.

Interestingly, I still receive an error extracting the lora from this with specific message of "AttributeError: 'LlamaModel' object has no attribute 'lm_head'" which I find a bit bizarre. I got this message before my change too, so it is unrelated, but I'm a bit surprised this change didn't fix it.

Handle weight aliases

686c96e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle weight aliases #490

Handle weight aliases #490

jrruethe commented Jan 21, 2025 •

edited

Loading

Handle weight aliases #490

Are you sure you want to change the base?

Handle weight aliases #490

Conversation

jrruethe commented Jan 21, 2025 • edited Loading

jrruethe commented Jan 21, 2025 •

edited

Loading