You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently for string columns, pandas will load the strings as native python strings, and dask will then create a slow task to convert them all to pyarrow strings. Pandas has recently introduced support for the pyarrow string dtype, and can load strings from parquet files directly into a pandas df with the pyarrow string type by specifying dtype_backend="pyarrow" as an option in the pd.read_parquet call.
We support passing kwargs to this function, but when generating the dask meta DataFrame from the parquet schema, we don't use pyarrow string types, and so we get a meta mismatch. So this needs to be updated here, and tested that the new dtype works with the other from_delayed functions for operations like crossmatching and joining where we generate the meta.
The text was updated successfully, but these errors were encountered:
Currently for string columns, pandas will load the strings as native python strings, and dask will then create a slow task to convert them all to pyarrow strings. Pandas has recently introduced support for the pyarrow string dtype, and can load strings from parquet files directly into a pandas df with the pyarrow string type by specifying
dtype_backend="pyarrow"
as an option in thepd.read_parquet
call.We support passing kwargs to this function, but when generating the dask meta DataFrame from the parquet schema, we don't use pyarrow string types, and so we get a meta mismatch. So this needs to be updated here, and tested that the new dtype works with the other
from_delayed
functions for operations like crossmatching and joining where we generate the meta.The text was updated successfully, but these errors were encountered: