Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert column dtypes to pyarrow by default in .from_dataframe() #303

Closed
2 of 3 tasks
hombit opened this issue May 2, 2024 · 0 comments · Fixed by #306
Closed
2 of 3 tasks

Convert column dtypes to pyarrow by default in .from_dataframe() #303

hombit opened this issue May 2, 2024 · 0 comments · Fixed by #306
Labels
enhancement New feature or request

Comments

@hombit
Copy link
Contributor

hombit commented May 2, 2024

Feature request

As a part of our transition to pyarrow dtypes, it would be great to convert dtypes of input arrays in .from_dataframe(). We still may have an opt-out options, when users have pyarrow-incompatible types (i.e. object) or wish to keep their types for a different reason.

Implementation ideas

Dataframe types may be converted in this way:

import pandas as pd
import pyarrow as pa

def convert_dtypes_to_pyarrow(df):
    new_series = {}
    for column in df.columns:
        try:
            pa_array = pa.array(df[column], from_pandas=True)
        except (ValueError, TypeError):
            raise ...
        series = pd.Series(
            pa_array,
            dtype=pd.ArrowDtype(pa_array.type),
            copy=False,
        )
        new_series[column] = series
    return pd.DataFrame(new_series, index=df.index, name=df.name, copy=False)

Before submitting
Please check the following:

  • I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
  • I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
  • If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.
@hombit hombit added the enhancement New feature or request label May 2, 2024
@hombit hombit linked a pull request May 7, 2024 that will close this issue
@nevencaplar nevencaplar moved this to In Progress in HATS / LSDB May 8, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in HATS / LSDB May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant