Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into mkdocs
Browse files Browse the repository at this point in the history
  • Loading branch information
DeaMariaLeon committed Oct 28, 2024
2 parents c839c84 + 800102f commit 8a7fed8
Show file tree
Hide file tree
Showing 284 changed files with 4,712 additions and 1,757 deletions.
51 changes: 43 additions & 8 deletions .github/workflows/downstream_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,21 +87,16 @@ jobs:
- name: show-deps
run: uv pip freeze
- name: Create assets directory, copy over index.html
continue-on-error: true
run: |
mkdir -p marimo/marimo/_static/assets
cp marimo/frontend/index.html marimo/marimo/_static/index.html
cp marimo/frontend/public/favicon.ico marimo/marimo/_static/favicon.ico
- name: Run tests with minimal dependencies
if: ${{ matrix.dependencies == 'core' }}
run: |
cd marimo
hatch run +py=${{ matrix.python-version }} test:test -v tests/ -k "not test_cli"
timeout-minutes: 15
- name: Run tests with optional dependencies
- name: Run tests with full dependencies
if: ${{ matrix.dependencies == 'core,optional' }}
run: |
cd marimo
hatch run +py=${{ matrix.python-version }} test-optional:test -v tests/ -k "not test_cli"
hatch run +py=${{ matrix.python-version }} test-optional:test-narwhals
timeout-minutes: 15
- name: Run typechecks
run: |
Expand Down Expand Up @@ -186,3 +181,43 @@ jobs:
run: |
cd py-shiny
make narwhals-test-integration
tubular:
strategy:
matrix:
python-version: ["3.12"]
os: [ubuntu-latest]

runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
enable-cache: "true"
cache-suffix: ${{ matrix.python-version }}
cache-dependency-glob: "**requirements*.txt"
- name: clone-tubular
run: |
git clone https://github.com/lvgig/tubular --depth=1
cd tubular
git log
- name: install-basics
run: uv pip install --upgrade tox virtualenv setuptools pytest-env --system
- name: install-tubular-dev
run: |
cd tubular
uv pip install -e .[dev] --system
- name: install-narwhals-dev
run: |
uv pip uninstall narwhals --system
uv pip install -e . --system
- name: show-deps
run: uv pip freeze
- name: Run pytest
run: |
cd tubular
pytest tests --config-file=pyproject.toml
2 changes: 1 addition & 1 deletion .github/workflows/extremes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ jobs:
nightlies:
strategy:
matrix:
python-version: ["3.11"]
python-version: ["3.13"]
os: [ubuntu-latest]
if: github.event.pull_request.head.repo.full_name == github.repository
runs-on: ${{ matrix.os }}
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
pytest-windows:
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.12"]
os: [windows-latest]

runs-on: ${{ matrix.os }}
Expand All @@ -61,7 +61,7 @@ jobs:
pytest-coverage:
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.9", "3.11", "3.13"]
os: [ubuntu-latest]

runs-on: ${{ matrix.os }}
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ coverage.xml
# Documentation
site/
todo.md
docs/this.md
docs/api-completeness/*.md
!docs/api-completeness/index.md

Expand Down
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: 'v0.6.9'
rev: 'v0.7.0'
hooks:
# Run the formatter.
- id: ruff-format
# Run the linter.
- id: ruff
args: [--fix]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v1.11.2'
rev: 'v1.12.1'
hooks:
- id: mypy
additional_dependencies: ['polars==1.4.1', 'pytest==8.3.2']
Expand Down Expand Up @@ -40,7 +40,7 @@ repos:
hooks:
- id: nbstripout
- repo: https://github.com/adamchainz/blacken-docs
rev: "1.18.0" # replace with latest tag on GitHub
rev: "1.19.0" # replace with latest tag on GitHub
hooks:
- id: blacken-docs
args: [--skip-errors]
Expand Down
17 changes: 12 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,17 +51,20 @@ Here's how you can set up your local development environment to contribute.

#### Option 1: Use UV (recommended)

1. Make sure you have Python3.8+ installed (for example, Python 3.11), create a virtual environment,
1. Make sure you have Python3.12 installed, create a virtual environment,
and activate it. If you're new to this, here's one way that we recommend:
1. Install uv: https://github.com/astral-sh/uv?tab=readme-ov-file#getting-started
2. Install some version of Python greater than Python3.8. For example, to install
Python3.11:
or make sure it is up-to-date with:
```
uv python install 3.11
uv self update
```
2. Install Python3.12:
```
uv python install 3.12
```
3. Create a virtual environment:
```
uv venv -p 3.11 --seed
uv venv -p 3.12 --seed
```
4. Activate it. On Linux, this is `. .venv/bin/activate`, on Windows `.\.venv\Scripts\activate`.
2. Install Narwhals: `uv pip install -e .`
Expand Down Expand Up @@ -106,6 +109,10 @@ nox

Notice that nox will also require to have all the python versions that are defined in the `noxfile.py` installed in your system.

#### Testing cuDF

We can't currently test in CI against cuDF, but you can test it manually in Kaggle using GPUs. Please follow this [Kaggle notebook](https://www.kaggle.com/code/marcogorelli/testing-cudf-in-narwhals) to run the tests.

### 7. Building docs

To build the docs, run `mkdocs serve`, and then open the link provided in a browser.
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,13 @@ Join the party!

- [Altair](https://github.com/vega/altair/)
- [Hamilton](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/narwhals)
- [marimo](https://github.com/marimo-team/marimo)
- [pymarginaleffects](https://github.com/vincentarelbundock/pymarginaleffects)
- [scikit-lego](https://github.com/koaning/scikit-lego)
- [scikit-playtime](https://github.com/koaning/scikit-playtime)
- [timebasedcv](https://github.com/FBruzzesi/timebasedcv)
- [marimo](https://github.com/marimo-team/marimo)
- [tubular](https://github.com/lvgig/tubular)
- [wimsey](https://github.com/benrutter/wimsey)

Feel free to add your project to the list if it's missing, and/or
[chat with us on Discord](https://discord.gg/V3PqtB4VA4) if you'd like any support.
Expand Down
5 changes: 5 additions & 0 deletions docs/api-reference/dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,19 @@
- get_polars
- get_pyarrow
- is_cudf_dataframe
- is_cudf_index
- is_cudf_series
- is_dask_dataframe
- is_ibis_table
- is_into_series
- is_modin_dataframe
- is_modin_index
- is_modin_series
- is_numpy_array
- is_pandas_dataframe
- is_pandas_index
- is_pandas_like_dataframe
- is_pandas_like_index
- is_pandas_like_series
- is_pandas_series
- is_polars_dataframe
Expand Down
3 changes: 2 additions & 1 deletion docs/api-reference/dtypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
members:
- Array
- List
- Struct
- Int64
- Int32
- Int16
Expand All @@ -15,12 +14,14 @@
- UInt32
- UInt16
- UInt8
- Field
- Float64
- Float32
- Boolean
- Categorical
- Enum
- String
- Struct
- Date
- Datetime
- Duration
Expand Down
19 changes: 10 additions & 9 deletions docs/api-reference/expr_dt.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,23 @@
members:
- convert_time_zone
- date
- year
- month
- day
- ordinal_day
- hour
- minute
- second
- millisecond
- microsecond
- millisecond
- minute
- month
- nanosecond
- ordinal_day
- replace_time_zone
- total_minutes
- total_seconds
- total_milliseconds
- second
- timestamp
- total_microseconds
- total_milliseconds
- total_minutes
- total_nanoseconds
- total_seconds
- to_string
- year
show_source: false
show_bases: false
3 changes: 3 additions & 0 deletions docs/api-reference/narwhals.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ Here are the top-level functions available in Narwhals.
- concat_str
- from_dict
- from_native
- from_arrow
- generate_temporary_column_name
- get_level
- get_native_namespace
- is_ordered_categorical
Expand All @@ -38,4 +40,5 @@ Here are the top-level functions available in Narwhals.
- when
- show_versions
- to_native
- to_py_scalar
show_source: false
19 changes: 10 additions & 9 deletions docs/api-reference/series_dt.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,23 @@
members:
- convert_time_zone
- date
- year
- month
- day
- ordinal_day
- hour
- minute
- second
- millisecond
- microsecond
- millisecond
- minute
- month
- nanosecond
- ordinal_day
- replace_time_zone
- total_minutes
- total_seconds
- total_milliseconds
- second
- timestamp
- total_microseconds
- total_milliseconds
- total_minutes
- total_nanoseconds
- total_seconds
- to_string
- year
show_source: false
show_bases: false
76 changes: 76 additions & 0 deletions docs/basics/dataframe_conversion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Conversion between libraries

Some library maintainers must apply complex dataframe operations, using methods and functions that may not (yet) be implemented in Narwhals. In such cases, Narwhals can still be highly beneficial, by allowing easy dataframe conversion.

## Dataframe X in, pandas out

Imagine that you maintain a library with a function that operates on pandas dataframes to produce automated reports. You want to allow users to supply a dataframe in any format to that function (pandas, Polars, DuckDB, cuDF, Modin, etc.) without adding all those dependencies to your own project and without special-casing each input library's variation of `to_pandas` / `toPandas` / `to_pandas_df` / `df` ...

One solution is to use Narwhals as a thin Dataframe ingestion layer, to convert user-supplied dataframe to the format that your library uses internally. Since Narwhals is zero-dependency, this is a much more lightweight solution than including all the dataframe libraries as dependencies,
and easier to write than special casing each input library's `to_pandas` method (if it even exists!).

To illustrate, we create dataframes in various formats:

```python exec="1" source="above" session="conversion"
import narwhals as nw
from narwhals.typing import IntoDataFrame

import duckdb
import polars as pl
import pandas as pd

df_polars = pl.DataFrame(
{
"A": [1, 2, 3, 4, 5],
"fruits": ["banana", "banana", "apple", "apple", "banana"],
"B": [5, 4, 3, 2, 1],
"cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
}
)
df_pandas = df_polars.to_pandas()
df_duckdb = duckdb.sql("SELECT * FROM df_polars")
```

Now, we define a function that can ingest any dataframe type supported by Narwhals, and convert it to a pandas DataFrame for internal use:

```python exec="1" source="above" session="conversion" result="python"
def df_to_pandas(df: IntoDataFrame) -> pd.DataFrame:
return nw.from_native(df).to_pandas()


print(df_to_pandas(df_polars))
```

## Dataframe X in, Polars out

### Via PyCapsule Interface

Similarly, if your library uses Polars internally, you can convert any user-supplied dataframe to Polars format using Narwhals.

```python exec="1" source="above" session="conversion" result="python"
def df_to_polars(df: IntoDataFrame) -> pl.DataFrame:
return nw.from_arrow(nw.from_native(df), native_namespace=pl).to_native()


print(df_to_polars(df_duckdb)) # You can only execute this line of code once.
```

It works to pass Polars to `native_namespace` here because Polars supports the [PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html) for import.

Note that the PyCapsule Interface makes no guarantee that you can call it repeatedly, so the approach above only works if you
only expect to perform the conversion a single time on each input object.

### Via PyArrow

If you need to ingest the same dataframe multiple times, then you may want to go via PyArrow instead.
This may be less efficient than the PyCapsule approach above (and always requires PyArrow!), but is more forgiving:

```python exec="1" source="above" session="conversion" result="python"
def df_to_polars(df: IntoDataFrame) -> pl.DataFrame:
return pl.DataFrame(nw.from_native(df).to_arrow())


df_duckdb = duckdb.sql("SELECT * FROM df_polars")
print(df_to_polars(df_duckdb)) # We can execute this...
print(df_to_polars(df_duckdb)) # ...as many times as we like!
```
Loading

0 comments on commit 8a7fed8

Please sign in to comment.