Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Casting pl.Categorical("physical") to pl.Categorical("lexical") doesn't invalidate sortedness flag #20864

Open
2 tasks done
evgenii-kuznetcov opened this issue Jan 23, 2025 · 0 comments
Labels
A-dtype-categorical Area: categorical data type accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars

Comments

@evgenii-kuznetcov
Copy link

evgenii-kuznetcov commented Jan 23, 2025

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

    df = pl.DataFrame({"s" : ["b", "a"], "v": [1, 2]})
    sorted_physically = df.cast({"s" : pl.Categorical("physical")}).sort("s")
    sorted_lexically = sorted_physically.cast({"s": pl.Categorical("lexical")}).sort("s")
    print(sorted_lexically)

Log output

shape: (2, 2)
┌─────┬─────┐
│ s   ┆ v   │
│ --- ┆ --- │
│ cat ┆ i64 │
╞═════╪═════╡
│ b   ┆ 1   │
│ a   ┆ 2   │
└─────┴─────┘

Issue description

    df = pl.DataFrame({"s" : ["b", "a"], "v": [1, 2]})
    sorted_physically = df.cast({"s" : pl.Categorical("physical")}).sort("s")
    sorted_lexically = sorted_physically.cast({"s": pl.Categorical("lexical")}).sort("s")
    print(sorted_lexically)

In this example data should be sorted s: [a, b] but it is sorted s: [b, a].
Seems that polars remembers that sorted_physically is already sorted and doesn't sort it after it has been casted.
The issue has been introduced sometime between 1.18 and 1.19

Expected behavior

Data should be sorted s: [a, b] but it is sorted s: [b, a].

Installed versions

Polars:              1.20.0
Index type:          UInt32
Platform:            macOS-15.2-arm64-arm-64bit
Python:              3.10.12 (main, Oct 31 2023, 16:05:04) [Clang 15.0.0 (clang-1500.0.40.1)]
LTS CPU:             False

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                1.35.5
cloudpickle          3.0.0
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          2.32.0
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         <not installed>
numpy                2.1.0
openpyxl             <not installed>
pandas               <not installed>
pyarrow              <not installed>
pydantic             1.10.18
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
None

@evgenii-kuznetcov evgenii-kuznetcov added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jan 23, 2025
@nameexhaustion nameexhaustion added accepted Ready for implementation A-dtype-categorical Area: categorical data type P-high Priority: high and removed needs triage Awaiting prioritization by a maintainer labels Jan 23, 2025
@nameexhaustion nameexhaustion changed the title Changing pl.Categorical("physical") to pl.Categorical("lexical") doesn't invalidate sorting Casting pl.Categorical("physical") to pl.Categorical("lexical") doesn't invalidate sortedness flag Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dtype-categorical Area: categorical data type accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants