Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InstructMol dataset #9975

Merged
merged 5 commits into from
Jan 24, 2025
Merged

Add InstructMol dataset #9975

merged 5 commits into from
Jan 24, 2025

Conversation

xnuohz
Copy link
Contributor

@xnuohz xnuohz commented Jan 23, 2025

Issue

#9699

Detail

compare between InstructMol and MoleculeGPT

  • data: the same data structure but different data sources, molecular graph + smiles sequence + question + answer
  • model: almost the same model paradigm, multimodal + QA
    so in this PR I only implemented the InstructMol dataset and added it to the MoleculeGPT model example.

@xnuohz xnuohz requested a review from wsad1 as a code owner January 23, 2025 13:31
@xnuohz
Copy link
Contributor Author

xnuohz commented Jan 23, 2025

python examples/llm/molecule_gpt.py --epochs 2 --batch_size 64
Setting up 'TinyLlama/TinyLlama-1.1B-Chat-v0.1' with configuration: {'revision': 'main', 'max_memory': {0: '23GiB'}, 'low_cpu_mem_usage': True, 'device_map': 'auto', 'torch_dtype': torch.bfloat16}
2025-01-23 19:51:11.429499: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-01-23 19:51:11.448023: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-23 19:51:11.448047: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-23 19:51:11.448085: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-23 19:51:11.451998: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-23 19:51:11.840210: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/home/ubuntu/Softwares/anaconda3/envs/pyg-dev/lib/python3.9/site-packages/accelerate/utils/imports.py:313: UserWarning: Intel Extension for PyTorch 2.1 needs to work with PyTorch 2.1.*, but PyTorch 2.5.1 is found. Please switch to the matching version and run again.
  warnings.warn(
Some weights of RobertaModel were not initialized from the model checkpoint at DeepChem/ChemBERTa-77M-MTR and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Total Preparation Time: 6.509785s
Training beginning...
Epoch: 1|2: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4083/4083 [42:57<00:00,  1.58it/s]
Epoch: 1|2, Train loss: 1.378120, Val loss: 1.275135
Epoch: 2|2: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 4083/4083 [42:50<00:00,  1.59it/s]
Epoch: 2|2, Train loss: 1.248772, Val loss: 1.246007
Total Training Time: 5346.377838s
Test loss: 1.250804
Total Time: 5451.700063s

@puririshi98
Copy link
Contributor

puririshi98 commented Jan 23, 2025

this LGTM and im find to merge it asap, just curious what the reasoning for making instructmol the default argparser is.
i just think its a bit confusing since the example says moleculeGPT. but if there is a strong reason i havent thought of let me know. Ill merge after i understand

@xnuohz
Copy link
Contributor Author

xnuohz commented Jan 24, 2025

ops, changed it back, the default setting was for testing purposes.

Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now LGTM

@puririshi98 puririshi98 merged commit ed89c94 into pyg-team:master Jan 24, 2025
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants