Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lutra query runner #4134

Merged
merged 16 commits into from
Feb 5, 2024
Merged

feat: lutra query runner #4134

merged 16 commits into from
Feb 5, 2024

Conversation

aljazerzen
Copy link
Member

@aljazerzen aljazerzen commented Jan 25, 2024

Initial take on #3825. Inspired by prql-query.

Adds a query runner named Lutra that is basically a wrapper for prqlc compile, sqlite3 run and print(pl.DataFrame).

Connection parameters are defined within PRQL source with @lutra.sqlite annotation.

Uses connectorx for executing the query and converting to Apache Arrow. Uses polars to print the dataframe.

Next step is to create Python bindings so the resulting dataframe can be used with other existing tooling.

@hulxv
Copy link
Contributor

hulxv commented Jan 25, 2024

Before a few months, I built a tool called prqlite (I think you took a look at it). Is there any difference between it and Lutra?

@eitsupi
Copy link
Member

eitsupi commented Jan 25, 2024

Oops, this seems to depend on a very old version of polars.......

@aljazerzen
Copy link
Member Author

Before a few months, I built a tool called prqlite (I think you took a look at it). Is there any difference between it and Lutra?

There are. prqlite is primarily a TUI and it does that well. My intention with Lutra is to be able to call it from Python (and other languages) while providing it with only PRQL source.

@aljazerzen
Copy link
Member Author

Oops, this seems to depend on a very old version of polars

Yeah. It also won't compile for wasm32-unknown-unknown without a serious effort, as we'd have to compile polars and rusqlite too. That does not seem like a battle worth taking right now.

@hulxv
Copy link
Contributor

hulxv commented Jan 25, 2024

My intention with Lutra is to be able to call it from Python (and other languages) while providing it with only PRQL source.

Then, Lutra is something like a library or do I misunderstand it?

@aljazerzen
Copy link
Member Author

Then, Lutra is something like a library or do I misunderstand it?

Yes, but also with an option to invoke it from the command line.

@max-sixty
Copy link
Member

Before a few months, I built a tool called prqlite (I think you took a look at it). Is there any difference between it and Lutra?

@hulxv this is cool! Nice work. I don't think we knew about this...

(the goals seem to be slightly different to Lutra fwiw)

- cmd: |
# remove trailing whitespace
rg '\s+$' --files-with-matches --glob '!*.{rs,snap}' . \
| xargs -I _ sh -c "echo Removing trailing whitespace from _ && sd '[\t ]+$' '' _"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI if you're in the mood for new tools, I really like rargs rather than xargs for this sort of thing. I almost never use xargs now)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...even if I still maintain that pre-commit is better for these sorts of lints! 😄

cargo \
llvm-cov --lcov --output-path lcov.info \
nextest \
{{.packages}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI if you want to put nextest & {{.packages}} into the base task totally fine w me. Then we have a single task and can pass packages.

(but also zero stress if you prefer to have them separate)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to have a separate task that builds and tests only the thing that I'm currently developing.

What do you mean "pass packages"? To have the base task take packages as argument?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "pass packages"? To have the base task take packages as argument?

Currently this task has a {{.packages}} arg, so it's possible to pass a subset of packages

I like to have a separate task that builds and tests only the thing that I'm currently developing.

Very much agree! TBC, this task is almost identical to the task in the parent task, but passes -p lutra to .packages. So they could be combined.

(I can make a PR later, maybe that's an easier way of explaining)

@@ -0,0 +1,19 @@

## This module is configured to pull data from an SQLite chinook.db database
@(lutra.sqlite {file="chinook.db"})
Copy link
Member

@max-sixty max-sixty Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we want to think about coupling the target and the query?

I would think we want to use some form of composition, where we say "associate this chinook.db file with this module chinook" somewhere, but we don't couple their definitions, so it's possible to change.

At a more conceptual level, it's could be similar to applying a functor in OCaml (or somewhat implementing a trait in Rust) — specifying a mapping of conceptual queries/functions to concrete queries/functions. Another reference is dbt's approach for templating the source of a query, so it's possible to run the same queries on a production vs dev vs personal database / schema.

I don't have a really good idea for how to implement this, and obv fine to experiment with things. But I do think that a really good design would be to make decouple the query and the target.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about not having actual connection params within the PRQL source, similar to how dbt does it.

I've opted for this approach because it is way more convenient and it does not actually rule out composition - the queries are not defined within the database module. This means that one could swap out the database module and run the queries on a different database.

This could be done by defining the database module in a separate file, which is not committed to VCS or is overridden in production.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can think about this.

I think there's some nice design that's possible, where we map the relations to concrete tables. dbt do a nice job at the moment even if it needs a decent amount of jinja.

I'll meditate on it...

Copy link
Member

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting!

Maybe we put a readme of a couple of lines saying what it is and that it's still experimental?

Cargo.toml Outdated Show resolved Hide resolved
@eitsupi eitsupi requested a review from max-sixty February 2, 2024 14:38
@max-sixty max-sixty mentioned this pull request Feb 3, 2024
@max-sixty
Copy link
Member

FYI can merge despite the failures — one is a cargo-audit which is fine and we can ignore; the other is the drop in test coverage

@aljazerzen
Copy link
Member Author

I'd like to replace connector-x with connector-arrow before merging. ConnectorX has stale dependencies and does a lot of things that are not needed in for lutra.

@aljazerzen aljazerzen enabled auto-merge (squash) February 5, 2024 21:42
@aljazerzen aljazerzen merged commit 114fb19 into main Feb 5, 2024
79 of 81 checks passed
@aljazerzen aljazerzen deleted the feat-lutra branch February 5, 2024 22:02
This was referenced Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants