KeyError: 'lm_head.weight' while lora extraction #484

ehristoforu · 2025-01-10T18:56:01Z

I got this error during LoRA Extraction. Parameters:
finetuned_model: ngxson/MiniThinky-v2-1B-Llama-3.2 (llama3.2-1b, fp16, not merged)
base_model: unsloth/Llama-3.2-1B-Instruct (base of finetuned_model, fp16)
rank: 32
Full logs:

 99% 146/147 [02:09<00:00, 1.12it/s]
Traceback (most recent call last):
 File "/usr/local/bin/mergekit-extract-lora", line 8, in <module>
 sys.exit(main())
 File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
 return self.main(*args, **kwargs)
 File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
 rv = self.invoke(ctx)
 File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
 return ctx.invoke(self.callback, **ctx.params)
 File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
 return __callback(*args, **kwargs)
 File "/usr/local/lib/python3.10/dist-packages/mergekit/scripts/extract_lora.py", line 574, in main
 lora_weights, ranks = extract_lora(
File "/usr/local/lib/python3.10/dist-packages/mergekit/scripts/extract_lora.py", line 244, in extract_lora
base_weight = base_loader.get_tensor(f"{module_name}.weight")
File "/usr/local/lib/python3.10/dist-packages/mergekit/io/lazy_tensor_loader.py", line 127, in get_tensor
raise KeyError(key)
KeyError: 'lm_head.weight'

The same error occurs when extracting Qwen models. If the error occurs for everything, then what is the functionality?
I hope someone can help me)

The text was updated successfully, but these errors were encountered:

jrruethe · 2025-01-15T14:55:02Z

I am getting the same error, for all of the following model types:

Qwen_Qwen2.5-0.5B
Qwen_Qwen2.5-1.5B
Qwen_Qwen2.5-3B
meta-llama_Llama-3.2-1B
meta-llama_Llama-3.2-3B

I believe these two issues offer hints:

In particular, it has something to do with lm_head being a different name in these architectures. @David-AU-github had a solution for Llama 3.2 3B, which may also work for Qwen with a few changes. It would appear that adding aliases would not break anything, if I am understanding this correctly.

Qwen2 0.5B and 1.5B do not have an independent lm_head layer, it is shared.

jrruethe · 2025-01-15T15:55:39Z

Oh, it appears someone else ran into this issue and made a PR, solving it a different way: https://github.com/arcee-ai/mergekit/pull/483/files

Similar issue: #447

jrruethe · 2025-01-21T15:12:08Z

Unfortunately, that PR didn't work for me, it spits out messages like:

Some weights of LlamaForCausalLM were not initialized from the model checkpoint at meta-llama_Llama-3.2-1B and are newly initialized: ['embed_tokens.weight', 'layers.0.input_layernorm.weight', 'layers.0.mlp.down_proj.weight', 'layers.0.mlp.gate_proj.weight', 'layers.0.mlp.up_proj.weight', 'layers.0.post_attention_layernorm.weight', 'layers.0.self_attn.k_proj.weight', 'layers.0.self_attn.o_proj.weight', 'layers.0.self_attn.q_proj.weight', 'layers.0.self_attn.v_proj.weight', 'layers.1.input_layernorm.weight', 'layers.1.mlp.down_proj.weight', 'layers.1.mlp.gate_proj.weight', 'layers.1.mlp.up_proj.weight', 'layers.1.post_attention_layernorm.weight', 'layers.1.self_attn.k_proj.weight', 'layers.1.self_attn.o_proj.weight', 'layers.1.self_attn.q_proj.weight', 'layers.1.self_attn.v_proj.weight', 'layers.10.input_layernorm.weight', 'layers.10.mlp.down_proj.weight', 'layers.10

I think it is due to each of those weights having different names from Llama 3.1 8B (the "model." prefix) but I'm not 100% sure.

https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/model.safetensors.index.json

I'm going to tinker with the aliases some more.

jrruethe mentioned this issue Jan 21, 2025

Handle weight aliases #490

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'lm_head.weight' while lora extraction #484

KeyError: 'lm_head.weight' while lora extraction #484

ehristoforu commented Jan 10, 2025

jrruethe commented Jan 15, 2025 •

edited

Loading

jrruethe commented Jan 15, 2025 •

edited

Loading

jrruethe commented Jan 21, 2025

KeyError: 'lm_head.weight' while lora extraction #484

KeyError: 'lm_head.weight' while lora extraction #484

Comments

ehristoforu commented Jan 10, 2025

jrruethe commented Jan 15, 2025 • edited Loading

jrruethe commented Jan 15, 2025 • edited Loading

jrruethe commented Jan 21, 2025

jrruethe commented Jan 15, 2025 •

edited

Loading

jrruethe commented Jan 15, 2025 •

edited

Loading