Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'lm_head.weight' while lora extraction #484

Open
ehristoforu opened this issue Jan 10, 2025 · 3 comments
Open

KeyError: 'lm_head.weight' while lora extraction #484

ehristoforu opened this issue Jan 10, 2025 · 3 comments

Comments

@ehristoforu
Copy link

I got this error during LoRA Extraction. Parameters:
finetuned_model: ngxson/MiniThinky-v2-1B-Llama-3.2 (llama3.2-1b, fp16, not merged)
base_model: unsloth/Llama-3.2-1B-Instruct (base of finetuned_model, fp16)
rank: 32
Full logs:

 99% 146/147 [02:09<00:00, 1.12it/s]
Traceback (most recent call last):
 File "/usr/local/bin/mergekit-extract-lora", line 8, in <module>
 sys.exit(main())
 File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
 return self.main(*args, **kwargs)
 File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
 rv = self.invoke(ctx)
 File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
 return ctx.invoke(self.callback, **ctx.params)
 File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
 return __callback(*args, **kwargs)
 File "/usr/local/lib/python3.10/dist-packages/mergekit/scripts/extract_lora.py", line 574, in main
 lora_weights, ranks = extract_lora(
File "/usr/local/lib/python3.10/dist-packages/mergekit/scripts/extract_lora.py", line 244, in extract_lora
base_weight = base_loader.get_tensor(f"{module_name}.weight")
File "/usr/local/lib/python3.10/dist-packages/mergekit/io/lazy_tensor_loader.py", line 127, in get_tensor
raise KeyError(key)
KeyError: 'lm_head.weight'

The same error occurs when extracting Qwen models. If the error occurs for everything, then what is the functionality?
I hope someone can help me)

@jrruethe
Copy link

jrruethe commented Jan 15, 2025

I am getting the same error, for all of the following model types:

  • Qwen_Qwen2.5-0.5B
  • Qwen_Qwen2.5-1.5B
  • Qwen_Qwen2.5-3B
  • meta-llama_Llama-3.2-1B
  • meta-llama_Llama-3.2-3B

I believe these two issues offer hints:

In particular, it has something to do with lm_head being a different name in these architectures. @David-AU-github had a solution for Llama 3.2 3B, which may also work for Qwen with a few changes. It would appear that adding aliases would not break anything, if I am understanding this correctly.

Qwen2 0.5B and 1.5B do not have an independent lm_head layer, it is shared.

@jrruethe
Copy link

jrruethe commented Jan 15, 2025

Oh, it appears someone else ran into this issue and made a PR, solving it a different way: https://github.com/arcee-ai/mergekit/pull/483/files

Similar issue: #447

@jrruethe
Copy link

Unfortunately, that PR didn't work for me, it spits out messages like:

Some weights of LlamaForCausalLM were not initialized from the model checkpoint at meta-llama_Llama-3.2-1B and are newly initialized: ['embed_tokens.weight', 'layers.0.input_layernorm.weight', 'layers.0.mlp.down_proj.weight', 'layers.0.mlp.gate_proj.weight', 'layers.0.mlp.up_proj.weight', 'layers.0.post_attention_layernorm.weight', 'layers.0.self_attn.k_proj.weight', 'layers.0.self_attn.o_proj.weight', 'layers.0.self_attn.q_proj.weight', 'layers.0.self_attn.v_proj.weight', 'layers.1.input_layernorm.weight', 'layers.1.mlp.down_proj.weight', 'layers.1.mlp.gate_proj.weight', 'layers.1.mlp.up_proj.weight', 'layers.1.post_attention_layernorm.weight', 'layers.1.self_attn.k_proj.weight', 'layers.1.self_attn.o_proj.weight', 'layers.1.self_attn.q_proj.weight', 'layers.1.self_attn.v_proj.weight', 'layers.10.input_layernorm.weight', 'layers.10.mlp.down_proj.weight', 'layers.10

I think it is due to each of those weights having different names from Llama 3.1 8B (the "model." prefix) but I'm not 100% sure.

https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/model.safetensors.index.json

I'm going to tinker with the aliases some more.

@jrruethe jrruethe mentioned this issue Jan 21, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants