-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: lutra query runner #4134
feat: lutra query runner #4134
Conversation
8f1293f
to
965f199
Compare
ac9bf73
to
eb1213d
Compare
Before a few months, I built a tool called prqlite (I think you took a look at it). Is there any difference between it and Lutra? |
Oops, this seems to depend on a very old version of polars....... |
There are. prqlite is primarily a TUI and it does that well. My intention with Lutra is to be able to call it from Python (and other languages) while providing it with only PRQL source. |
Yeah. It also won't compile for wasm32-unknown-unknown without a serious effort, as we'd have to compile polars and rusqlite too. That does not seem like a battle worth taking right now. |
Then, Lutra is something like a library or do I misunderstand it? |
Yes, but also with an option to invoke it from the command line. |
- cmd: | | ||
# remove trailing whitespace | ||
rg '\s+$' --files-with-matches --glob '!*.{rs,snap}' . \ | ||
| xargs -I _ sh -c "echo Removing trailing whitespace from _ && sd '[\t ]+$' '' _" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(FYI if you're in the mood for new tools, I really like rargs
rather than xargs
for this sort of thing. I almost never use xargs
now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...even if I still maintain that pre-commit
is better for these sorts of lints! 😄
cargo \ | ||
llvm-cov --lcov --output-path lcov.info \ | ||
nextest \ | ||
{{.packages}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI if you want to put nextest
& {{.packages}}
into the base task totally fine w me. Then we have a single task and can pass packages
.
(but also zero stress if you prefer to have them separate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to have a separate task that builds and tests only the thing that I'm currently developing.
What do you mean "pass packages"? To have the base task take packages as argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean "pass packages"? To have the base task take packages as argument?
Currently this task has a {{.packages}}
arg, so it's possible to pass a subset of packages
I like to have a separate task that builds and tests only the thing that I'm currently developing.
Very much agree! TBC, this task is almost identical to the task in the parent task, but passes -p lutra
to .packages
. So they could be combined.
(I can make a PR later, maybe that's an easier way of explaining)
@@ -0,0 +1,19 @@ | |||
|
|||
## This module is configured to pull data from an SQLite chinook.db database | |||
@(lutra.sqlite {file="chinook.db"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we want to think about coupling the target and the query?
I would think we want to use some form of composition, where we say "associate this chinook.db
file with this module chinook
" somewhere, but we don't couple their definitions, so it's possible to change.
At a more conceptual level, it's could be similar to applying a functor in OCaml (or somewhat implementing a trait in Rust) — specifying a mapping of conceptual queries/functions to concrete queries/functions. Another reference is dbt's approach for templating the source of a query, so it's possible to run the same queries on a production vs dev vs personal database / schema.
I don't have a really good idea for how to implement this, and obv fine to experiment with things. But I do think that a really good design would be to make decouple the query and the target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about not having actual connection params within the PRQL source, similar to how dbt does it.
I've opted for this approach because it is way more convenient and it does not actually rule out composition - the queries are not defined within the database module. This means that one could swap out the database module and run the queries on a different database.
This could be done by defining the database module in a separate file, which is not committed to VCS or is overridden in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can think about this.
I think there's some nice design that's possible, where we map the relations to concrete tables. dbt do a nice job at the moment even if it needs a decent amount of jinja.
I'll meditate on it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting!
Maybe we put a readme of a couple of lines saying what it is and that it's still experimental?
FYI can merge despite the failures — one is a |
I'd like to replace connector-x with connector-arrow before merging. ConnectorX has stale dependencies and does a lot of things that are not needed in for lutra. |
8a1cf65
to
01f3f2d
Compare
Initial take on #3825. Inspired by prql-query.
Adds a query runner named Lutra that is basically a wrapper for
prqlc compile
,sqlite3 run
andprint(pl.DataFrame)
.Connection parameters are defined within PRQL source with
@lutra.sqlite
annotation.Uses connectorx for executing the query and converting to Apache Arrow. Uses polars to print the dataframe.
Next step is to create Python bindings so the resulting dataframe can be used with other existing tooling.