Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars LazyFrame show_graph has poor graphic quality in Marimo #3355

Open
kjgoodrick opened this issue Jan 7, 2025 · 2 comments
Open

Polars LazyFrame show_graph has poor graphic quality in Marimo #3355

kjgoodrick opened this issue Jan 7, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@kjgoodrick
Copy link
Contributor

Description

I am submitting a PR for this suggestion.

When using polars LazyFrames it is often desirable to display the query plan prior to collecting the query. There are multiple ways to do this in Marimo.

  1. Leave the LazyFrame object as the last line in a cell.
    • This works well and displays a nice SVG image in the cell (as long as graphviz is installed and on the path). Image
  2. Manually call the show_graph function
    • This is necessary if one wants to show the optimized version of the query plan
    • Unfortunately, this produces a very poor output if matplotlib is installed or an error if it is notImage
  3. Use explain to output a text-based query plan
    • This works well and Marimo even formats the text (whereas Jupyter notebooks by default show the raw string)
    • However, it can be harder to follow than the graph for more complicated queries. Image

It would be nice if Marimo could display high quality graphics for the query plan even when the user wants to view the optimized plan and/or doesn't have matplotlib / graphviz installed. It would also be nice if the graph followed the theme of the notebook (dark / light) and could nicely display plans that have many terms (e.g. a query that transforms many columns in one step).

Suggested solution

My suggestion is to register a polars extension that adds a marimo mo namespace to LazyFrames and allows users to display high quality query plan graphs in all situations with only a slight change to their code. This added polars extension code will be maintained within the marimo repository so that it will not require adding any code to the polars repository.

Because marimo already has support for displaying mermaid graphs and polars can return the raw text defining the graph (in dot notation) it makes sense to parse this and convert to mermaid.

Once added this approach would allow for results that:

  1. Are natively high quality for all means of showing the graph
  2. Wrap the text in long columns
  3. Do not require graphviz or matplotlib to be installed
  4. Adopt to the user theme by default
Case Current Proposed Notes
Show Graph Image Image
Light Mode Example Image Image
LazyFrame last line Image Image No Change here, would likely require a change to polars unless marimo has a way to change the behavior of objects as the last line.
Join Graph Image Image Current behavior sometimes does not fit on the screen if the width is not "right"
Wide example Image Image Proposed shown with output expanded

Alternative

Instead of outputting meramaid code it might be possible to recreate polars _repr_html_ in order to display a high-quality image of the graph. However, this would require the user to have dot installed, would not change color with the user theme, and would not give the line wrapping behavior for wide graphs.

Additional context

I think ultimately the best solution would be to have show_graph and _repr_html_ in polars recognize that they are in marimo and change their behavior. This would require their cooperation though, which is of unknown likelihood to me. I have written the implementation for the PR such that it would be easy for them to call marimo's functions if they detect a marimo environment (similar to what they already do for notebooks).

This snippet could be added in to the polars display_dot_graph function after the raw_output check to get the same behavior when using polars' show_graph function.

try:
    from marimo import running_in_notebook
    from marimo._polars.lazyframe import Marimo 

    if running_in_notebook():
        return Marimo._dot_to_mermaid_html(dot)
except ImportError:
    pass
@kjgoodrick kjgoodrick added the enhancement New feature or request label Jan 7, 2025
@kjgoodrick kjgoodrick mentioned this issue Jan 7, 2025
4 tasks
@mscolnick
Copy link
Contributor

mscolnick commented Jan 7, 2025

Hey @kjgoodrick this seems like a great idea/addition. You can actually easily add that logic to https://github.com/marimo-team/marimo/blob/main/marimo/_output/formatters/df_formatters.py

You can see what we do for dataframes (just call table()) and just call mermaid() for the lazyframe

@Mizokuiam
Copy link

Thank you for raising this issue! I'll look into it and try to help if I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants