Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QST: how can I save my sparse dataframe with indexes and columns to a format other than .pkl? #60017

Closed
2 tasks done
michelkluger opened this issue Oct 10, 2024 · 5 comments
Closed
2 tasks done
Labels
Needs Info Clarification about behavior needed to assess issue Sparse Sparse Data Type Usage Question

Comments

@michelkluger
Copy link

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/60956268/save-sparse-pandas-dataframe-to-different-file-types

Question about pandas

I want have sparse dataframe structure saved in one file (to avoid proliferation of files)

how can I write and read and get the same sparse df? any alternative to pickle?

@michelkluger michelkluger added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Oct 10, 2024
@rhshadrach rhshadrach added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 22, 2024
@rhshadrach
Copy link
Member

Is your goal just to save disk space? If you can convert the sparse DataFrame to dense in memory, then you can use parquet which is very efficient on sparse data. In the example that you gave in SO, I'm seeing the dense DataFrame take up 18 megabytes in memory but only 1.5 megabytes on disk.

@rhshadrach rhshadrach added the Sparse Sparse Data Type label Oct 22, 2024
@michelkluger
Copy link
Author

Is your goal just to save disk space? If you can convert the sparse DataFrame to dense in memory, then you can use parquet which is very efficient on sparse data. In the example that you gave in SO, I'm seeing the dense DataFrame take up 18 megabytes in memory but only 1.5 megabytes on disk.

Ideally I would like a way / format, to_,format method that is easy to read end write from sparse. If my matrix is too big, converting to dense will cause memory errors

@rhshadrach
Copy link
Member

Makes sense - it seems to me that hdf would be the most appropriate, and for that there is #42070. Closing in favor of that issue.

@michelkluger
Copy link
Author

it seems to me that hdf would be the most appropriate,

why?

@rhshadrach
Copy link
Member

I do not believe any other file format can support it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Info Clarification about behavior needed to assess issue Sparse Sparse Data Type Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants