Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inductive_train_test_split() and split_graph() functions #8243

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

ogawayuto
Copy link

@ogawayuto ogawayuto commented Oct 22, 2023

Hi,

This PR is based on the discussion in Discussion #8238. While it's possible to use masks and the subgraph() function to implement inductive node/link tasks, there's currently a lack of explicit functions for splitting data into training and testing sets specifically for inductive settings. Furthermore, when dealing with inductive link prediction, it's crucial to distinguish which edges are intended for training and testing. However, the subgraph() function doesn't provide information about test edges.

To address these limitations, I've introduced the inductive_train_test_split() function. This function streamlines the process of dividing a graph into a train graph and a test graph. It enables you to specify which edges should be allocated for training and testing, ensuring a clear separation of data for inductive tasks. It's also designed to be easily integrated with scikit-learn functions such as KFolds() or StratifiedKFolds().

This commit includes three main changes:

  1. Modification of utils/__init__.py.

  2. Creation of inductive_train_test_split.py.

This file contains two functions: inductive_train_test_split() and split_graph(). The former leverages the latter and operates on Data objects. It also performs checks to ensure there is no overlap between the two subsets and that all nodes are covered.

  1. Creation of inductive_train_test_split_test.py.

This file includes tests for the two functions using simple examples. The test data aligns with the examples provided in the docstring.

Your reviews and comments on this contribution are greatly appreciated!

@ogawayuto ogawayuto requested a review from wsad1 as a code owner October 22, 2023 02:33
@codecov
Copy link

codecov bot commented Oct 22, 2023

Codecov Report

Merging #8243 (6e89a45) into master (9f7e824) will decrease coverage by 0.60%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #8243      +/-   ##
==========================================
- Coverage   89.00%   88.40%   -0.60%     
==========================================
  Files         475      476       +1     
  Lines       28841    28912      +71     
==========================================
- Hits        25670    25561     -109     
- Misses       3171     3351     +180     
Files Coverage Δ
torch_geometric/utils/__init__.py 100.00% <100.00%> (ø)
...orch_geometric/utils/inductive_train_test_split.py 100.00% <100.00%> (ø)

... and 37 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@ogawayuto ogawayuto changed the title inductive_train_test_split() and split_graph() functions from #8238 inductive_train_test_split() and split_graph() functions from Oct 22, 2023
@ogawayuto ogawayuto changed the title inductive_train_test_split() and split_graph() functions from inductive_train_test_split() and split_graph() functions Oct 22, 2023
@rusty1s rusty1s changed the title inductive_train_test_split() and split_graph() functions inductive_train_test_split() and split_graph() functions Oct 22, 2023
@ogawayuto
Copy link
Author

I'm ready for the pull request review !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants