Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for Papers100m single gpu and single node multi gpu examples (Cugraph, GATConv, better default hyperparams, eval on all ranks) #8173

Merged
merged 260 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
260 commits
Select commit Hold shift + click to select a range
b45f939
removing unused code
puririshi98 Sep 25, 2023
25bce03
cleaning up
puririshi98 Sep 25, 2023
96719ef
cleaning flake
puririshi98 Sep 25, 2023
7f4b0a1
fix
puririshi98 Sep 25, 2023
c78de79
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 25, 2023
381cfe5
cleaning
puririshi98 Sep 26, 2023
2146bbb
Merge branch 'master' into papers100m-multinode
puririshi98 Sep 27, 2023
0dc57ce
Merge branch 'master' into papers100m-multinode
puririshi98 Sep 28, 2023
2763d0b
Merge branch 'master' into papers100m-multinode
puririshi98 Oct 2, 2023
e63c0c0
Merge branch 'master' into papers100m-multinode
puririshi98 Oct 2, 2023
4da89c1
Rename multigpu_papers100m_gcn.py to singlenode_multigpu_papers100m_g…
puririshi98 Oct 6, 2023
85ece34
Merge branch 'master' into papers100m-multinode
puririshi98 Oct 6, 2023
74da17a
Merge branch 'master' into papers100m-multinode
puririshi98 Oct 6, 2023
78ff70a
Merge branch 'master' into papers100m-multinode
puririshi98 Oct 6, 2023
c590526
Merge branch 'master' into papers100m-multinode
puririshi98 Oct 8, 2023
8edbe0c
Merge branch 'master' into papers100m-multinode
puririshi98 Oct 9, 2023
3ebfeec
Merge branch 'master' into papers100m-multinode
puririshi98 Oct 9, 2023
f9654b3
Merge branch 'master' into papers100m-multinode
puririshi98 Oct 9, 2023
531a313
Merge branch 'master' into papers100m-multinode
puririshi98 Oct 10, 2023
60132dd
upgrading papers100m examples
puririshi98 Oct 10, 2023
0c36a60
upgrading papers100m examples
puririshi98 Oct 10, 2023
32132c9
upgrading papers100m examples
puririshi98 Oct 10, 2023
0570032
upgrading papers100m examples
puririshi98 Oct 10, 2023
90c7baf
upgrading papers100m examples
puririshi98 Oct 10, 2023
c517c7d
upgrading papers100m examples
puririshi98 Oct 10, 2023
57605eb
upgrading papers100m examples
puririshi98 Oct 10, 2023
b1a2162
upgrading papers100m examples
puririshi98 Oct 10, 2023
bac5de5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 10, 2023
e9253c8
upgrading papers100m examples
puririshi98 Oct 10, 2023
6587387
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 10, 2023
0200c77
Merge branch 'master' into cugraph-paper100m
puririshi98 Oct 11, 2023
3630c3c
clean up
puririshi98 Oct 11, 2023
59f589f
Merge branch 'cugraph-paper100m' of https://github.com/pyg-team/pytor…
puririshi98 Oct 11, 2023
6465e68
clean up
puririshi98 Oct 11, 2023
51e50a1
clean up
puririshi98 Oct 11, 2023
d62dd72
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 11, 2023
5104b0e
clean up
puririshi98 Oct 11, 2023
49318fb
clean up
puririshi98 Oct 11, 2023
4f5ef31
clean up
puririshi98 Oct 11, 2023
8d8963b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 11, 2023
f3e5a2e
clean up
puririshi98 Oct 11, 2023
bc51b8e
clean up
puririshi98 Oct 11, 2023
5c8c72a
clean up
puririshi98 Oct 11, 2023
163460d
clean up
puririshi98 Oct 11, 2023
5a5df2a
clean up
puririshi98 Oct 11, 2023
6592065
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 11, 2023
7e6ad67
clean up
puririshi98 Oct 12, 2023
b0d00df
Merge branch 'master' into cugraph-paper100m
puririshi98 Oct 12, 2023
b3c1b49
Merge branch 'cugraph-paper100m' of https://github.com/pyg-team/pytor…
puririshi98 Oct 12, 2023
c91a215
clean up
puririshi98 Oct 12, 2023
82298fe
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 12, 2023
17b0ff5
clean up
puririshi98 Oct 12, 2023
ae41824
Merge branch 'cugraph-paper100m' of https://github.com/pyg-team/pytor…
puririshi98 Oct 12, 2023
0b500ff
cleanup
puririshi98 Oct 12, 2023
4814f16
cleanup
puririshi98 Oct 12, 2023
774364c
cleanup
puririshi98 Oct 12, 2023
bca5950
cleanup
puririshi98 Oct 12, 2023
59c95f7
cleanup
puririshi98 Oct 13, 2023
1185241
cleanup
puririshi98 Oct 13, 2023
6087bf2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 13, 2023
5e8fb25
Merge branch 'master' into cugraph-paper100m
puririshi98 Oct 17, 2023
8ed97f0
Merge branch 'master' into cugraph-paper100m
puririshi98 Oct 18, 2023
17bf9e7
cleanup
puririshi98 Oct 19, 2023
9397bec
Merge branch 'cugraph-paper100m' of https://github.com/pyg-team/pytor…
puririshi98 Oct 19, 2023
705de9a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 19, 2023
d2bc07a
fixes from Cugraph-PyG lead Alexandria Barghi
puririshi98 Oct 20, 2023
8a33c36
Merge branch 'cugraph-paper100m' of https://github.com/pyg-team/pytor…
puririshi98 Oct 20, 2023
ae8b489
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 20, 2023
173a694
Merge branch 'master' into cugraph-paper100m
puririshi98 Oct 20, 2023
5f0b1d7
Merge branch 'master' into cugraph-paper100m
puririshi98 Oct 23, 2023
dae1164
notes
puririshi98 Oct 23, 2023
a450e28
notes
puririshi98 Oct 23, 2023
61028fc
cleaning
puririshi98 Oct 23, 2023
de7069f
fixing timer
puririshi98 Oct 25, 2023
cad138a
Merge branch 'master' into cugraph-paper100m
puririshi98 Oct 25, 2023
d7b0fd7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 25, 2023
64f3afa
cleaning
puririshi98 Oct 25, 2023
795d8ec
cleaning
puririshi98 Oct 25, 2023
4bcc31e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 25, 2023
c5b89e7
eval on all ranks
puririshi98 Oct 26, 2023
bc1d74c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 26, 2023
c86f852
eval on all ranks
puririshi98 Oct 26, 2023
09a04a3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 26, 2023
171e893
cleanup
puririshi98 Oct 26, 2023
c911f39
cleaning
puririshi98 Oct 26, 2023
8849c7d
cleanup
puririshi98 Oct 26, 2023
d542221
cleaning
puririshi98 Oct 26, 2023
86958fb
cleaning
puririshi98 Oct 26, 2023
b3f8f15
Merge branch 'master' of https://github.com/pyg-team/pytorch_geometri…
puririshi98 Oct 26, 2023
99b7511
cleaning multinode example
puririshi98 Oct 26, 2023
3e353f6
cleaning multinode example
puririshi98 Oct 27, 2023
8b49b75
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 27, 2023
bfa98f9
fixes for single node multigpu cugraph
puririshi98 Oct 30, 2023
9cc4d8e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2023
bde8c82
cleaning up the code shared by cugraph team
puririshi98 Oct 30, 2023
579d47e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2023
ac1df3c
doesnt work...
puririshi98 Oct 30, 2023
5097b7e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 30, 2023
182bf96
Merge branch 'master' into cugraph-paper100m
puririshi98 Oct 31, 2023
700713f
Update CHANGELOG.md
puririshi98 Oct 31, 2023
8722cce
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 31, 2023
35b5597
cleaning
puririshi98 Oct 31, 2023
d7d7812
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 31, 2023
5662921
cleaning SNMG example
puririshi98 Oct 31, 2023
48094f9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 31, 2023
7b701ca
cleaning
puririshi98 Nov 1, 2023
f5ebbc4
fixing worldsize issue on get_num_workers
puririshi98 Nov 13, 2023
d683597
SNMG needed a timer
puririshi98 Nov 14, 2023
29f4abe
fixing
puririshi98 Nov 14, 2023
0afa7ab
Merge branch 'master' into cugraph-paper100m
puririshi98 Nov 17, 2023
89f6535
renaming to match mag240m PR
puririshi98 Dec 8, 2023
5d4c54a
renaming
puririshi98 Dec 8, 2023
ef4dfb5
Merge branch 'master' into cugraph-paper100m
puririshi98 Dec 8, 2023
c7bf8df
Update CHANGELOG.md
puririshi98 Dec 8, 2023
887b0a5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 8, 2023
9f3c3f5
adding drop_last=True and shuffle=True
puririshi98 Dec 11, 2023
fd68f4f
fixing
puririshi98 Dec 11, 2023
0612dbf
fixing
puririshi98 Dec 11, 2023
d1db8b9
removing the multinode changes, making it a seperate PR
puririshi98 Jan 4, 2024
4ddb2fe
Update examples/multi_gpu/papers100m_gcn.py
puririshi98 Jan 5, 2024
4be2028
Merge branch 'master' into cugraph-paper100m
puririshi98 Jan 5, 2024
c4df7bb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 5, 2024
3ff46cd
addressing review
puririshi98 Jan 5, 2024
b409c2c
adding comment for rapids no init
puririshi98 Jan 5, 2024
31d3d5a
Merge branch 'master' into cugraph-paper100m
puririshi98 Jan 8, 2024
a35c74f
Merge branch 'master' into cugraph-paper100m
puririshi98 Jan 10, 2024
81522be
eval on all ranks
puririshi98 Jan 11, 2024
d9d02c9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 11, 2024
6c36fd1
Apply suggestions from code review
akihironitta Jan 11, 2024
66dd97b
apply alexandria's fix
puririshi98 Jan 11, 2024
7ef00fe
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 11, 2024
4df8f10
cleaning up for pre-commit
puririshi98 Jan 12, 2024
48cc061
Merge branch 'master' into cugraph-paper100m
puririshi98 Jan 12, 2024
e39aade
alexandria's fix
puririshi98 Jan 12, 2024
dff6fe3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 12, 2024
240e41c
Merge branch 'master' into cugraph-paper100m
puririshi98 Jan 12, 2024
1548698
fixing precommit ci
puririshi98 Jan 12, 2024
76602e7
fixing eval on all ranks for native pyg
puririshi98 Jan 16, 2024
443023f
Merge branch 'master' into cugraph-paper100m
puririshi98 Jan 22, 2024
869f608
cleaning
puririshi98 Jan 22, 2024
54a9cf5
cleaning
puririshi98 Jan 23, 2024
cdfce2e
Merge branch 'master' into cugraph-paper100m
puririshi98 Jan 23, 2024
7f0dc20
new better hyperparams increasing test acc to 45%
puririshi98 Jan 29, 2024
48e7e2a
new 45% test acc hyperparams
puririshi98 Jan 29, 2024
87d9233
20 epochs
puririshi98 Jan 29, 2024
6dbcdb4
Merge branch 'master' into cugraph-paper100m
puririshi98 Jan 29, 2024
6d5b057
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 29, 2024
183d671
adding cuda synchronize
puririshi98 Jan 29, 2024
4b13b30
adding cuda sync
puririshi98 Jan 29, 2024
a0bd85b
fixing timing
puririshi98 Jan 29, 2024
4a15b7a
cleaning
puririshi98 Jan 29, 2024
e047aa2
cleaning
puririshi98 Jan 30, 2024
704f974
fixing typo
puririshi98 Jan 30, 2024
1706a03
new hyperparams
puririshi98 Jan 30, 2024
074e7ae
new hyperparams
puririshi98 Jan 30, 2024
e0fc933
cuda sync for timing
puririshi98 Jan 31, 2024
5fb7094
cuda sync for timing
puririshi98 Jan 31, 2024
15e63f3
adding n_devices flag
puririshi98 Jan 31, 2024
74ab468
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 31, 2024
301420c
adding timer for training prep
puririshi98 Jan 31, 2024
133fbf6
training prep timer
puririshi98 Jan 31, 2024
370ab98
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 31, 2024
ddc4262
fixing syntax
puririshi98 Jan 31, 2024
38c65bf
fixing typo
puririshi98 Jan 31, 2024
6190e16
clean up
puririshi98 Feb 5, 2024
7ea89d3
cleaning
puririshi98 Feb 5, 2024
e513f9e
Merge branch 'master' into cugraph-paper100m
puririshi98 Feb 5, 2024
88ce13a
better_timer
puririshi98 Feb 5, 2024
9ea04c8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 5, 2024
149abc6
better timing
puririshi98 Feb 5, 2024
ffc8918
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 5, 2024
c49653a
Merge branch 'master' into cugraph-paper100m
puririshi98 Feb 7, 2024
6d6ed12
Merge branch 'master' into cugraph-paper100m
puririshi98 Feb 20, 2024
00d04e2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 20, 2024
aeb4000
cleanup accuracy eval
puririshi98 Feb 20, 2024
5af3086
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 20, 2024
eb5fbf0
precommit cleanup
puririshi98 Feb 20, 2024
7bd924c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 20, 2024
5907124
cleaning precommit ci up
puririshi98 Feb 20, 2024
c431645
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 20, 2024
816ac8b
splitting examples up w/ and w/o cugraph
puririshi98 Feb 20, 2024
2868caa
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 20, 2024
68db875
splitting single gpu examples w/ and w/o cugraph
puririshi98 Feb 20, 2024
727e5a5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 20, 2024
4ddcf48
splitting single node multigpu examples
puririshi98 Feb 20, 2024
2f9a744
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 20, 2024
e84c3a8
Create papers100m_gcn_cugraph.py
puririshi98 Feb 20, 2024
b9de756
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 20, 2024
f78737e
cleaning
puririshi98 Feb 21, 2024
fd27a34
clean up
puririshi98 Feb 21, 2024
3400ad1
clean up
puririshi98 Feb 21, 2024
7136d1d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 21, 2024
2a31369
Merge branch 'master' into cugraph-paper100m
puririshi98 Feb 21, 2024
d654cb1
Merge branch 'master' into cugraph-paper100m
puririshi98 Feb 21, 2024
fde4c33
Merge branch 'master' into cugraph-paper100m
puririshi98 Feb 23, 2024
d0d7ebe
Merge branch 'master' into cugraph-paper100m
puririshi98 Feb 26, 2024
041e116
Merge branch 'master' into cugraph-paper100m
puririshi98 Feb 27, 2024
7415d50
Merge branch 'master' into cugraph-paper100m
puririshi98 Feb 28, 2024
5b21275
Merge branch 'master' into cugraph-paper100m
puririshi98 Feb 29, 2024
d892399
Update CHANGELOG.md
puririshi98 Feb 29, 2024
6fcf208
cleanup
puririshi98 Feb 29, 2024
ff52b52
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 1, 2024
3596c85
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 1, 2024
a05cc37
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 4, 2024
201f1c9
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 5, 2024
7dd879d
Update README.md
puririshi98 Mar 5, 2024
be579bb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 5, 2024
45200ab
Update README.md
puririshi98 Mar 5, 2024
4af9464
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 5, 2024
061e1ce
Update README.md
puririshi98 Mar 5, 2024
eced7f3
Update examples/README.md
puririshi98 Mar 6, 2024
04b7db5
Update README.md
puririshi98 Mar 6, 2024
5c4088a
Update examples/multi_gpu/README.md
puririshi98 Mar 6, 2024
a110a25
Update examples/multi_gpu/README.md
puririshi98 Mar 6, 2024
887c769
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2024
02ec828
cleaning up num workers function
puririshi98 Mar 6, 2024
b9ec4de
cugraph neighborloader only returns HeteroData objects, papers100m is…
puririshi98 Mar 6, 2024
a98bede
syntax cleanup
puririshi98 Mar 6, 2024
92c9147
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2024
cbd60c2
cugraph always returns heterodata, need to_homo
puririshi98 Mar 6, 2024
ba2d9c3
to homo not needed for vanilla pyg neighborloader
puririshi98 Mar 6, 2024
4d2533d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2024
235224f
no to_homo needed for vanilla
puririshi98 Mar 6, 2024
e51811d
Update examples/multi_gpu/papers100m_gcn.py
puririshi98 Mar 6, 2024
e4805b1
eval->val loader
puririshi98 Mar 6, 2024
4596bcb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2024
42855d8
reset acc after each eval
puririshi98 Mar 6, 2024
10c56cb
acc resets and syntax cleanup
puririshi98 Mar 6, 2024
ce3dcfc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2024
d9ae413
cleanup of init_pytorch_worker(no longer a function, just commenting it)
puririshi98 Mar 6, 2024
37dabfa
typo cleanup
puririshi98 Mar 6, 2024
01bd18c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2024
4d11c46
model.module not needed for vanilla PyG
puririshi98 Mar 6, 2024
84f1f55
comments for cugraph feature store
puririshi98 Mar 6, 2024
265f058
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 6, 2024
542a718
fix from alexandria for single gpu
puririshi98 Mar 8, 2024
7e89099
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 8, 2024
0c643b7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 8, 2024
388f619
precommit CI
puririshi98 Mar 8, 2024
c740a96
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 8, 2024
8613822
Update ogbn_papers_100m_cugraph.py
puririshi98 Mar 9, 2024
c1df757
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 9, 2024
70383dc
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 11, 2024
2f08e2e
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 12, 2024
5f282ee
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 13, 2024
267c3d3
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 15, 2024
26e285f
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 20, 2024
91f13c4
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 26, 2024
bd73b03
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 26, 2024
9eb7723
Merge branch 'master' into cugraph-paper100m
puririshi98 Mar 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Added

- Added support for cuGraph data loading and `GAT` in single node Papers100m examples ([#8173](https://github.com/pyg-team/pytorch_geometric/pull/8173))
- Added the `VariancePreservingAggregation` (VPA) ([#9075](https://github.com/pyg-team/pytorch_geometric/pull/9075))
- Added option to pass custom` from_smiles` functionality to `PCQM4Mv2` and `MoleculeNet` ([#9073](https://github.com/pyg-team/pytorch_geometric/pull/9073))
- Added `group_cat` functionality ([#9029](https://github.com/pyg-team/pytorch_geometric/pull/9029))
Expand Down
1 change: 1 addition & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ For examples on [Open Graph Benchmark](https://ogb.stanford.edu/) datasets, see
- [`ogbn_products_sage.py`](./ogbn_products_sage.py) and [`ogbn_products_gat.py`](./ogbn_products_gat.py) show how to train [`GraphSAGE`](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.models.GraphSAGE.html) and [`GAT`](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.models.GAT.html) models on the `ogbn-products` dataset.
- [`ogbn_proteins_deepgcn.py`](./ogbn_proteins_deepgcn.py) is an example to showcase how to train deep GNNs on the `ogbn-proteins` dataset.
- [`ogbn_papers_100m.py`](./ogbn_papers_100m.py) is an example for training a GNN on the large-scale `ogbn-papers100m` dataset, containing approximately ~1.6B edges.
- [`ogbn_papers_100m_cugraph.py`](./ogbn_papers_100m_cugraph.py) shows how to accelerate the `ogbn-papers100m` workflow using [CuGraph](https://github.com/rapidsai/cugraph).

For examples on using `torch.compile`, see the examples under [`examples/compile`](./compile).

Expand Down
3 changes: 2 additions & 1 deletion examples/multi_gpu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
| [`distributed_sampling.py`](./distributed_sampling.py) | single-node | Example for training GNNs on a homogeneous graph with neighbor sampling. |
| [`distributed_sampling_multinode.py`](./distributed_sampling_multinode.py) | multi-node | Example for training GNNs on a homogeneous graph with neighbor sampling on multiple nodes. |
| [`distributed_sampling_multinode.sbatch`](./distributed_sampling_multinode.sbatch) | multi-node | Example for submitting a training job to a Slurm cluster using [`distributed_sampling_multi_node.py`](./distributed_sampling_multinode.py). |
| [`papers100m_gcn.py`](./papers100m_gcn.py) | single-node | Example for training GNNs on a homogeneous graph. |
| [`papers100m_gcn.py`](./papers100m_gcn.py) | single-node | Example for training GNNs on the `ogbn-papers100M` homogeneous graph w/ ~1.6B edges. |
| [`papers100m_gcn_cugraph.py`](./papers100m_gcn_cugraph.py%60) | single-node | Example for accelerating GNN training on `ogbn-papers100M` using [CuGraph](...). |
| [`papers100m_gcn_multinode.py`](./papers100m_gcn_multinode.py) | multi-node | Example for training GNNs on a homogeneous graph on multiple nodes. |
| [`mag240m_graphsage.py`](./mag240m_graphsage.py) | single-node | Example for training GNNs on a large heterogeneous graph. |
| [`taobao.py`](./taobao.py) | single-node | Example for training link prediction GNNs on a heterogeneous graph. |
Expand Down
239 changes: 147 additions & 92 deletions examples/multi_gpu/papers100m_gcn.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import argparse
import os
import tempfile
import time

import torch
Expand All @@ -7,136 +9,189 @@
import torch.nn.functional as F
from ogb.nodeproppred import PygNodePropPredDataset
from torch.nn.parallel import DistributedDataParallel
from torchmetrics import Accuracy

import torch_geometric
from torch_geometric.loader import NeighborLoader
from torch_geometric.nn import GCNConv


def get_num_workers(world_size: int) -> int:
num_workers = None
def get_num_workers(world_size):
num_work = None
if hasattr(os, "sched_getaffinity"):
try:
num_workers = len(os.sched_getaffinity(0)) // (2 * world_size)
num_work = len(os.sched_getaffinity(0)) / (2 * world_size)
except Exception:
pass
if num_workers is None:
num_workers = os.cpu_count() // (2 * world_size)
return num_workers
if num_work is None:
num_work = os.cpu_count() / (2 * world_size)
return int(num_work)


class GCN(torch.nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels):
super().__init__()
self.conv1 = GCNConv(in_channels, hidden_channels)
self.conv2 = GCNConv(hidden_channels, out_channels)
def run_train(rank, data, world_size, model, epochs, batch_size, fan_out,
split_idx, num_classes, wall_clock_start, tempdir=None,
num_layers=3):

def forward(self, x, edge_index=None):
x = F.dropout(x, p=0.5, training=self.training)
x = self.conv1(x, edge_index).relu()
x = F.dropout(x, p=0.5, training=self.training)
x = self.conv2(x, edge_index)
return x


def run(rank, world_size, data, split_idx, model):
# init pytorch worker
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12355'
dist.init_process_group('nccl', rank=rank, world_size=world_size)

split_idx['train'] = split_idx['train'].split(
split_idx['train'].size(0) // world_size,
dim=0,
)[rank].clone()

model = DistributedDataParallel(model.to(rank), device_ids=[rank])
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
if world_size > 1:
split_idx['train'] = split_idx['train'].split(
split_idx['train'].size(0) // world_size, dim=0)[rank].clone()
split_idx['valid'] = split_idx['valid'].split(
split_idx['valid'].size(0) // world_size, dim=0)[rank].clone()
split_idx['test'] = split_idx['test'].split(
split_idx['test'].size(0) // world_size, dim=0)[rank].clone()
model = model.to(rank)
model = DistributedDataParallel(model, device_ids=[rank])
optimizer = torch.optim.Adam(model.parameters(), lr=0.01,
weight_decay=0.0005)

kwargs = dict(
data=data,
batch_size=128,
num_workers=get_num_workers(world_size),
num_neighbors=[50, 50],
num_neighbors=[fan_out] * num_layers,
batch_size=batch_size,
)
train_loader = NeighborLoader(
input_nodes=split_idx['train'],
shuffle=True,
**kwargs,
)
if rank == 0:
val_loader = NeighborLoader(input_nodes=split_idx['valid'], **kwargs)
test_loader = NeighborLoader(input_nodes=split_idx['test'], **kwargs)

val_steps = 1000
warmup_steps = 100
num_work = get_num_workers(world_size)
train_loader = NeighborLoader(data, input_nodes=split_idx['train'],
num_workers=num_work, shuffle=True,
drop_last=True, **kwargs)
val_loader = NeighborLoader(data, input_nodes=split_idx['valid'],
num_workers=num_work, **kwargs)
test_loader = NeighborLoader(data, input_nodes=split_idx['test'],
num_workers=num_work, **kwargs)

eval_steps = 1000
warmup_steps = 20
acc = Accuracy(task="multiclass", num_classes=num_classes).to(rank)
dist.barrier()
torch.cuda.synchronize()
if rank == 0:
prep_time = round(time.perf_counter() - wall_clock_start, 2)
print("Total time before training begins (prep_time) =", prep_time,
"seconds")
print("Beginning training...")

for epoch in range(1, 4):
model.train()
for epoch in range(epochs):
for i, batch in enumerate(train_loader):
if i == warmup_steps:
torch.cuda.synchronize()
start = time.time()
batch = batch.to(rank)
batch_size = batch.num_sampled_nodes[0]
batch.y = batch.y.to(torch.long)
optimizer.zero_grad()
y = batch.y[:batch.batch_size].view(-1).to(torch.long)
out = model(batch.x, batch.edge_index)[:batch.batch_size]
loss = F.cross_entropy(out, y)
out = model(batch.x, batch.edge_index)
loss = F.cross_entropy(out[:batch_size], batch.y[:batch_size])
loss.backward()
optimizer.step()

if rank == 0 and i % 10 == 0:
print(f'Epoch: {epoch:02d}, Iteration: {i}, Loss: {loss:.4f}')

print("Epoch: " + str(epoch) + ", Iteration: " + str(i) +
", Loss: " + str(loss))
nb = i + 1.0
dist.barrier()
torch.cuda.synchronize()
if rank == 0:
sec_per_iter = (time.time() - start) / (i - warmup_steps)
print(f"Avg Training Iteration Time: {sec_per_iter:.6f} s/iter")

model.eval()
total_correct = total_examples = 0
print("Average Training Iteration Time:",
(time.time() - start) / (nb - warmup_steps), "s/iter")
with torch.no_grad():
for i, batch in enumerate(val_loader):
if i >= val_steps:
if i >= eval_steps:
break
if i == warmup_steps:
start = time.time()

batch = batch.to(rank)
with torch.no_grad():
out = model(batch.x, batch.edge_index)[:batch.batch_size]
pred = out.argmax(dim=-1)
y = batch.y[:batch.batch_size].view(-1).to(torch.long)

total_correct += int((pred == y).sum())
total_examples += y.size(0)

print(f"Val Acc: {total_correct / total_examples:.4f}")
sec_per_iter = (time.time() - start) / (i - warmup_steps)
print(f"Avg Inference Iteration Time: {sec_per_iter:.6f} s/iter")

if rank == 0:
model.eval()
total_correct = total_examples = 0
batch_size = batch.num_sampled_nodes[0]

batch.y = batch.y.to(torch.long)
out = model(batch.x, batch.edge_index)
acc_i = acc( # noqa
out[:batch_size].softmax(dim=-1), batch.y[:batch_size])
acc_sum = acc.compute()
puririshi98 marked this conversation as resolved.
Show resolved Hide resolved
puririshi98 marked this conversation as resolved.
Show resolved Hide resolved
if rank == 0:
print(f"Validation Accuracy: {acc_sum * 100.0:.4f}%", )
dist.barrier()
acc.reset()

with torch.no_grad():
for i, batch in enumerate(test_loader):
batch = batch.to(rank)
with torch.no_grad():
out = model(batch.x, batch.edge_index)[:batch.batch_size]
pred = out.argmax(dim=-1)
y = batch.y[:batch.batch_size].view(-1).to(torch.long)
batch_size = batch.num_sampled_nodes[0]

total_correct += int((pred == y).sum())
total_examples += y.size(0)
print(f"Test Acc: {total_correct / total_examples:.4f}")
batch.y = batch.y.to(torch.long)
out = model(batch.x, batch.edge_index)
acc_i = acc( # noqa
out[:batch_size].softmax(dim=-1), batch.y[:batch_size])
acc_sum = acc.compute()
puririshi98 marked this conversation as resolved.
Show resolved Hide resolved
if rank == 0:
print(f"Test Accuracy: {acc_sum * 100.0:.4f}%", )
dist.barrier()
acc.reset()
if rank == 0:
total_time = round(time.perf_counter() - wall_clock_start, 2)
print("Total Program Runtime (total_time) =", total_time, "seconds")
print("total_time - prep_time =", total_time - prep_time, "seconds")


if __name__ == '__main__':
dataset = PygNodePropPredDataset(name='ogbn-papers100M')
split_idx = dataset.get_idx_split()
model = GCN(dataset.num_features, 64, dataset.num_classes)

world_size = torch.cuda.device_count()
print('Let\'s use', world_size, 'GPUs!')
mp.spawn(
run,
args=(world_size, dataset[0], split_idx, model),
nprocs=world_size,
join=True,
parser = argparse.ArgumentParser()
parser.add_argument('--hidden_channels', type=int, default=256)
parser.add_argument('--num_layers', type=int, default=2)
parser.add_argument('--lr', type=float, default=0.001)
parser.add_argument('--epochs', type=int, default=20)
parser.add_argument('--batch_size', type=int, default=1024)
parser.add_argument('--fan_out', type=int, default=30)
parser.add_argument(
"--use_gat_conv",
action='store_true',
help="Whether or not to use GATConv. (Defaults to using GCNConv)",
)
parser.add_argument(
"--n_gat_conv_heads",
type=int,
default=4,
help="If using GATConv, number of attention heads to use",
)
parser.add_argument(
"--n_devices", type=int, default=-1,
help="1-8 to use that many GPUs. Defaults to all available GPUs")

args = parser.parse_args()
wall_clock_start = time.perf_counter()

dataset = PygNodePropPredDataset(name='ogbn-papers100M',
root='/datasets/ogb_datasets')
split_idx = dataset.get_idx_split()
data = dataset[0]
data.y = data.y.reshape(-1)
if args.use_gat_conv:
model = torch_geometric.nn.models.GAT(dataset.num_features,
args.hidden_channels,
args.num_layers,
dataset.num_classes,
heads=args.n_gat_conv_heads)
else:
model = torch_geometric.nn.models.GCN(
dataset.num_features,
args.hidden_channels,
args.num_layers,
dataset.num_classes,
)

print("Data =", data)
if args.n_devices == -1:
world_size = torch.cuda.device_count()
else:
world_size = args.n_devices
print('Let\'s use', world_size, 'GPUs!')
with tempfile.TemporaryDirectory() as tempdir:
if world_size > 1:
mp.spawn(
run_train,
args=(data, world_size, model, args.epochs, args.batch_size,
args.fan_out, split_idx, dataset.num_classes,
wall_clock_start, tempdir, args.num_layers),
nprocs=world_size, join=True)
else:
run_train(0, data, world_size, model, args.epochs, args.batch_size,
args.fan_out, split_idx, dataset.num_classes,
wall_clock_start, tempdir, args.num_layers)
Loading