Reproducibility issue for finetuning Phi3 Vision on DocVQA dataset #121

qwedaq · 2024-08-01T17:31:54Z

This issue is for a: (mark with an `x`)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

I used the following command to finetune Phi 3 vision using LoRA
CUDA_VISIBLE_DEVICES=6 python3 finetune_hf_trainer_docvqa.py --full_train --use_lora --bf16 --lora_rank=32 --lora_alpha_ratio=16 --batch_size=64 --learning_rate=2e-4 --num_train_epochs=2 --freeze_vision_model

Any log messages given by the failure

Expected/desired behavior

The reported ANLS after finetuning in the readme is 82.46, the ANLS I got is 75.68. Infact the ANLS score before finetuning is 77.02.

OS and Version?

Linux with CUDA 12.2

The text was updated successfully, but these errors were encountered:

leestott · 2024-08-06T09:37:28Z

@qwedaq sorry can you confirm which sample your running from the cookbook?

qwedaq · 2024-08-07T04:13:51Z

@qwedaq sorry can you confirm which sample your running from the cookbook?

Hi @leestott, I am running the following script for Phi3 Vision from the cookbook

https://github.com/microsoft/Phi-3CookBook/blob/main/code/04.Finetuning/vision_finetuning/finetune_hf_trainer_docvqa.py

leestott · 2024-08-09T12:16:08Z

@ChenRocks please can you look into this with your finetuning sample

ChenRocks · 2024-08-11T05:37:54Z

Hi @qwedaq, thanks for reporting your results. Note that all deep learning training has inherent randomness; therefore, it is possible that a re-run results in slight accuracy difference.

However, in your case, the drop is significant. The reason is this --lora_alpha_ratio=16 hyper parameter. The correct way of setting lora_alpha to 16 is --lora_alpha_ratio=0.5. See this line.

I know this may not be obvious for users. I will improve the document later. Thanks!

qwedaq · 2024-08-13T04:31:21Z

Hi @qwedaq, thanks for reporting your results. Note that all deep learning training has inherent randomness; therefore, it is possible that a re-run results in slight accuracy difference.

However, in your case, the drop is significant. The reason is this --lora_alpha_ratio=16 hyper parameter. The correct way of setting lora_alpha to 16 is --lora_alpha_ratio=0.5. See this line.

I know this may not be obvious for users. I will improve the document later. Thanks!

This is working now. I am able to reproduce the results. Thank you

qwedaq · 2024-09-10T07:44:01Z

I just had quick question related to the same code. I would like to know why Phi3V reports the final results using ANLS metric and does not use more modern metrics such BLEU, BERT or ROUGE-L?

leestott added the question Further information is requested label Aug 6, 2024

leestott assigned ChenRocks Aug 9, 2024

qwedaq closed this as completed Aug 13, 2024

qwedaq reopened this Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility issue for finetuning Phi3 Vision on DocVQA dataset #121

Reproducibility issue for finetuning Phi3 Vision on DocVQA dataset #121

qwedaq commented Aug 1, 2024

leestott commented Aug 6, 2024

qwedaq commented Aug 7, 2024

leestott commented Aug 9, 2024

ChenRocks commented Aug 11, 2024

qwedaq commented Aug 13, 2024

qwedaq commented Sep 10, 2024

Reproducibility issue for finetuning Phi3 Vision on DocVQA dataset #121

Reproducibility issue for finetuning Phi3 Vision on DocVQA dataset #121

Comments

qwedaq commented Aug 1, 2024

This issue is for a: (mark with an x)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

leestott commented Aug 6, 2024

qwedaq commented Aug 7, 2024

leestott commented Aug 9, 2024

ChenRocks commented Aug 11, 2024

qwedaq commented Aug 13, 2024

qwedaq commented Sep 10, 2024

This issue is for a: (mark with an `x`)