-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(skorch): add an inherited class from skorch.NeuralNet that is compatible with PyTorch Frame #375
base: master
Are you sure you want to change the base?
Conversation
…et that is compatible with PyTorch Frame
for more information, see https://pre-commit.ci
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #375 +/- ##
=======================================
Coverage 93.52% 93.52%
=======================================
Files 124 124
Lines 6456 6456
=======================================
Hits 6038 6038
Misses 418 418 ☔ View full report in Codecov by Sentry. |
Sure - on both! |
1706c96
to
0b9426f
Compare
@weihua916 Would you mind reviewing if you think this is a good way to implement it? Also, it is strange that mypy in pre-commit does not raise errors, but mypy in CI does. I don't think there is any way to deal with this. |
@weihua916 @zechengz @yiweny Would appreciate your review, thank you very much in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Do you mind adding unit test similar to https://github.com/pyg-team/pytorch-frame/blob/master/test/gbdt/test_gbdt.py?
A kind check-in. Is there any progress here? |
No progress, sorry |
for more information, see https://pre-commit.ci
…ough it can be used
@weihua916 Removed all tutorials and added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a quick question.
# [stype.text_embedded], | ||
# [stype.numerical, stype.numerical, stype.text_embedded], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't support these stypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently not supported at this time due to lack of time to understand how to use these dtypes.
However, since it probably only require changes in the arguments of the NeuralNet, it should have little trouble extending it in the future.
if pass_dataset: | ||
net.fit(dataset) | ||
_ = net.predict(test_dataset) | ||
else: | ||
net.fit(X_train, y_train) | ||
_ = net.predict(X_test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we take tensor frame? It's also weird to sometimes take dataset and sometimes take data frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The main purpose of this PR is to allow DataFrame to be DIRECTLY fitted, as shown in examples/sklearn_api.py.
- Since it is unclear how to create a Dataset from a TensorFrame, and if there is a TensorFrame, there should be also a Dataset, which means there is little need to implement this, and even to use skorch as the user might be familiar with deep learning.
- Instead of "sometimes take dataset and sometimes take data frame", both are tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm if dataframe is directly fed, it is unclear why we need this feature within pytorch frame.
the whole point of pytorch frame is to materialize data frame into tensor frame, to be processed by pytorch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not match your purpose but my goal is to use advanced neural networks implemented in pytorch_frame in existing sklearn pipeline.
This PR allows pytorch_frame to be used on top of existing scikit-learn code without having to heavily modify the existing code. Since many people use sklearn Pipeline
, especially on Kaggle, it is easy to verify performance changes by changing or assembling the estimator in other people's code to my NeuralNetPytorchFrame
. I am convinced that this will be very valuable.
@weihua916 Would you please reconsider merging this PR? This change makes it easy to try out (This PR is not intended to save training of Pytorch models that do not use Thank you in advance. |
@weihua916 @yiweny @akihironitta Any chance that this PR could be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this integration! 🚀 Thank you for your effort :)
After reviewing the PR, I'd recommend keeping this integration as an example script rather than merging it into torch_frame/
. Since this integration may not see widespread or frequent use, maintaining the code (e.g., keeping up with new APIs skorch
introduces and making sure all features work in PyTorch Frame) could be challenging in the future. IMO, presenting it as an example would make more sense. Alternatively, @34j (and contributors in the linked issue together) could consider creating a separate repository to showcase this integration as a community-driven effort, and we could help highlight it via our docs or social media. (We can always promote the integration into our codebase in the future if it gains traction and the community expresses a need for it.)
Looks like PyTorch Frame doesn't need many changes to support this integration, but I'm happy to provide any extension points when it's necessary.
I'm open to any comments/feedbacks/disussions!
Thanks for your positive reply. In that case I would consider trying to make another package, but I am very busy right now, so please leave this as it is for a while. Note that in any case one-line change I made (in dataset.py) needs to be merged for that to work and you may want to review that specific line. |
Closes #147
@MacOS Please continue from here if it helps.Sorry for being so loud, but this took me a whole day, so I would appreciate it very much if you could make me as a co-author if you used this code.