Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with TorchData #329

Open
nimaous opened this issue Jan 3, 2024 · 1 comment
Open

Integration with TorchData #329

nimaous opened this issue Jan 3, 2024 · 1 comment

Comments

@nimaous
Copy link

nimaous commented Jan 3, 2024

Hi all,

In my project, I use TorchData to read parquet files from AWS S3 buckets. Currently, it seems that pytorch-frame can not be integrated with torchdata. I was wondering if you have any plans to make it possible or if you have any workaround solution to read parquets files from S3 buckets using torchframe dataset?

Thanks,

@yiweny
Copy link
Contributor

yiweny commented Jan 3, 2024

It seems that TorchData is no longer under active development. Not sure if we have plans to integrate with it on our side.

If you can load data stored in the parquet files into a Pandas Dataframe, you can create a DataLoader using torch_frame.data.DataLoader by directly supplying the dataframe as the dataset argument. However, pandas DataFrame can be memory intensive. So you might run into issues with large datasets.

We do welcome community contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants