Request for Modified Dataset to Resolve Training Issue : sharegpt4v_instruct_gpt4-vision_cap100k_new.json #41

alichr · 2025-01-02T23:50:20Z

Hi there,

I hope this message finds you well. I am currently working on training the model in the third stage as described in your repository. However, I’ve encountered an issue related to the dataset configuration:

Specifically, I am using the sharegpt4v_instruct_gpt4-vision_cap100k_new.json dataset from [Lin-Chen/ShareGPT4V](https://huggingface.co/datasets/Lin-Chen/ShareGPT4V/tree/main) with the following configuration:

{
    'type': "llava_instruct",
    'ann_file': 'sharegpt4v_instruct_gpt4-vision_cap100k_new.json',
    'img_prefix': 'dataset/sharegpt4v/data',
    'ratio': 0.23,
    'conv_temp': 'llava'
}

After a few iterations, the training process encounters an issue in the dataloader (llava.py). I suspect the problem may be due to a mismatch between the dataset I downloaded and the version you used in your experiments. From your documentation, it seems you might have modified the dataset to align with the training script.

Would it be possible for you to share the modified version of sharegpt4v_instruct_gpt4-vision_cap100k_new.json or provide details about the modifications you made? This would greatly help me resolve the issue and proceed with the training.

Thank you in advance for your time and support! Your work has been invaluable to the community, and I appreciate your efforts in maintaining the repository.

Looking forward to your guidance.

Best regards,
Ali

The text was updated successfully, but these errors were encountered:

machuofan · 2025-01-11T07:26:56Z

Hi there, I modified sharegpt4v_instruct_gpt4-vision_cap100k_new.json simply because several images (less than 10) have incorrect paths in the original json annotations. But for some reasons, I do not have access to my version of sharegpt4v annotations at this time. You may have a try to filter your downloaded sharegpt4v by removing missing images in the annotation. Hope this will help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Modified Dataset to Resolve Training Issue : sharegpt4v_instruct_gpt4-vision_cap100k_new.json #41

Request for Modified Dataset to Resolve Training Issue : sharegpt4v_instruct_gpt4-vision_cap100k_new.json #41

alichr commented Jan 2, 2025

machuofan commented Jan 11, 2025

Request for Modified Dataset to Resolve Training Issue : sharegpt4v_instruct_gpt4-vision_cap100k_new.json #41

Request for Modified Dataset to Resolve Training Issue : sharegpt4v_instruct_gpt4-vision_cap100k_new.json #41

Comments

alichr commented Jan 2, 2025

machuofan commented Jan 11, 2025