Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility issue for finetuning Phi3 Vision on DocVQA dataset #121

Open
qwedaq opened this issue Aug 1, 2024 · 6 comments
Open

Reproducibility issue for finetuning Phi3 Vision on DocVQA dataset #121

qwedaq opened this issue Aug 1, 2024 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@qwedaq
Copy link

qwedaq commented Aug 1, 2024

This issue is for a: (mark with an x)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

I used the following command to finetune Phi 3 vision using LoRA
CUDA_VISIBLE_DEVICES=6 python3 finetune_hf_trainer_docvqa.py --full_train --use_lora --bf16 --lora_rank=32 --lora_alpha_ratio=16 --batch_size=64 --learning_rate=2e-4 --num_train_epochs=2 --freeze_vision_model

Any log messages given by the failure

Screenshot (252)

Expected/desired behavior

The reported ANLS after finetuning in the readme is 82.46, the ANLS I got is 75.68. Infact the ANLS score before finetuning is 77.02.

OS and Version?

Linux with CUDA 12.2

@leestott
Copy link
Contributor

leestott commented Aug 6, 2024

@qwedaq sorry can you confirm which sample your running from the cookbook?

@leestott leestott added the question Further information is requested label Aug 6, 2024
@qwedaq
Copy link
Author

qwedaq commented Aug 7, 2024

@qwedaq sorry can you confirm which sample your running from the cookbook?

Hi @leestott, I am running the following script for Phi3 Vision from the cookbook

https://github.com/microsoft/Phi-3CookBook/blob/main/code/04.Finetuning/vision_finetuning/finetune_hf_trainer_docvqa.py

@leestott
Copy link
Contributor

leestott commented Aug 9, 2024

@ChenRocks please can you look into this with your finetuning sample

@ChenRocks
Copy link
Contributor

Hi @qwedaq, thanks for reporting your results. Note that all deep learning training has inherent randomness; therefore, it is possible that a re-run results in slight accuracy difference.

However, in your case, the drop is significant. The reason is this --lora_alpha_ratio=16 hyper parameter. The correct way of setting lora_alpha to 16 is --lora_alpha_ratio=0.5. See this line.

I know this may not be obvious for users. I will improve the document later. Thanks!

@qwedaq
Copy link
Author

qwedaq commented Aug 13, 2024

Hi @qwedaq, thanks for reporting your results. Note that all deep learning training has inherent randomness; therefore, it is possible that a re-run results in slight accuracy difference.

However, in your case, the drop is significant. The reason is this --lora_alpha_ratio=16 hyper parameter. The correct way of setting lora_alpha to 16 is --lora_alpha_ratio=0.5. See this line.

I know this may not be obvious for users. I will improve the document later. Thanks!

This is working now. I am able to reproduce the results. Thank you

@qwedaq qwedaq closed this as completed Aug 13, 2024
@qwedaq
Copy link
Author

qwedaq commented Sep 10, 2024

I just had quick question related to the same code. I would like to know why Phi3V reports the final results using ANLS metric and does not use more modern metrics such BLEU, BERT or ROUGE-L?

@qwedaq qwedaq reopened this Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants