Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Modified Dataset to Resolve Training Issue : sharegpt4v_instruct_gpt4-vision_cap100k_new.json #41

Open
alichr opened this issue Jan 2, 2025 · 1 comment

Comments

@alichr
Copy link

alichr commented Jan 2, 2025

Hi there,

I hope this message finds you well. I am currently working on training the model in the third stage as described in your repository. However, I’ve encountered an issue related to the dataset configuration:

Specifically, I am using the sharegpt4v_instruct_gpt4-vision_cap100k_new.json dataset from [Lin-Chen/ShareGPT4V](https://huggingface.co/datasets/Lin-Chen/ShareGPT4V/tree/main) with the following configuration:

{
    'type': "llava_instruct",
    'ann_file': 'sharegpt4v_instruct_gpt4-vision_cap100k_new.json',
    'img_prefix': 'dataset/sharegpt4v/data',
    'ratio': 0.23,
    'conv_temp': 'llava'
}

After a few iterations, the training process encounters an issue in the dataloader (llava.py). I suspect the problem may be due to a mismatch between the dataset I downloaded and the version you used in your experiments. From your documentation, it seems you might have modified the dataset to align with the training script.

Would it be possible for you to share the modified version of sharegpt4v_instruct_gpt4-vision_cap100k_new.json or provide details about the modifications you made? This would greatly help me resolve the issue and proceed with the training.

Thank you in advance for your time and support! Your work has been invaluable to the community, and I appreciate your efforts in maintaining the repository.

Looking forward to your guidance.

Best regards,
Ali

@machuofan
Copy link
Collaborator

Hi there, I modified sharegpt4v_instruct_gpt4-vision_cap100k_new.json simply because several images (less than 10) have incorrect paths in the original json annotations. But for some reasons, I do not have access to my version of sharegpt4v annotations at this time. You may have a try to filter your downloaded sharegpt4v by removing missing images in the annotation. Hope this will help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants