Llama-3.2-1B-Instruct - reproduce MMLU score of 49.3 #243

Alexey234432 · 2024-12-12T12:12:48Z

Hi,

Thanks for the model release.

I am using Llama-3.2-1B-Instruct model with hugging face and can't reproduce MMLU score of 49.3. My 5 shot eval ended up with 46.65 instead. I wonder what is wrong with my setup, any examples of how exactly to compose a prompt?

I've used this as a starting point https://github.com/QwenLM/Qwen/blob/main/eval/evaluate_mmlu.py and modified model init to

def load_models_tokenizer(args):
    from transformers import pipeline

    model_id = "meta-llama/Llama-3.2-1B-Instruct"
    pipe = pipeline(
        "text-generation",
        model=model_id,
        torch_dtype=torch.float32,
        device_map="auto",
    )


    tokenizer = pipe.tokenizer
    model = pipe.model
    tokenizer.pad_token = tokenizer.eos_token

    return model, tokenizer

any pointers on what can be wrong with my setup or examples of exact prompt etc I should use?

Thank you.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-3.2-1B-Instruct - reproduce MMLU score of 49.3 #243

Llama-3.2-1B-Instruct - reproduce MMLU score of 49.3 #243

Alexey234432 commented Dec 12, 2024

Llama-3.2-1B-Instruct - reproduce MMLU score of 49.3 #243

Llama-3.2-1B-Instruct - reproduce MMLU score of 49.3 #243

Comments

Alexey234432 commented Dec 12, 2024