Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas pyarrow-backed strings display differently to other strings #441

Open
2 of 6 tasks
MarcoGorelli opened this issue Sep 14, 2024 · 1 comment
Open
2 of 6 tasks

Comments

@MarcoGorelli
Copy link

MarcoGorelli commented Sep 14, 2024

Prework

Description

if I run

from great_tables import GT
import pandas as pd

df = pd.DataFrame(
    {
        "example": ["Row " + str(x) for x in range(1, 5)],
        "numbers": [
            "20 23 6 7 37 23 21 4 7 16",
            "2.3 6.8 9.2 2.42 3.5 12.1 5.3 3.6 7.2 3.74",
            "-12 -5 6 3.7 0 8 -7.4",
            "2 0 15 7 8 10 1 24 17 13 6",i
        ],
    }
)
GT(df)

and then

GT(df.convert_dtypes(dtype_backend='pyarrow'))

then they display differently

image

For Polars, the alignment is different still?

image

Reproducible example

  • Post a minimal reproducible example (MRE) so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the Style Guide for Python Code.

Expected result

I think they should look the same?

Development environment

  • Operating System: linux
  • great_tables Version: 0.11.0

Additional context

Add any other context about the problem here.

@machow
Copy link
Collaborator

machow commented Sep 19, 2024

Shoot -- this issue comes from our very old and rough code for handling alignment:

def align_from_data(self, data: TblData):

Currently, it uses the string names of dtypes, which is a bit brittle (def an area where narwhals can help ;). We should move this code into out DataFrame backend layer, _tbl_data.py, and clean up the detection of dtypes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants