Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

marimo lint/tidy notebook.py #1543

Open
dmadisetti opened this issue Jun 2, 2024 · 5 comments
Open

marimo lint/tidy notebook.py #1543

dmadisetti opened this issue Jun 2, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@dmadisetti
Copy link
Collaborator

dmadisetti commented Jun 2, 2024

Description

Discussion in #1379 brings up the great point that tools like ruff will struggle with marimo since it cannot prune imports and unused variables (since all defs are returned)

Suggested solution

marimo can suggest values to prune based on the graph.
It can check to see if there are any dangling defs.

some mo specific lint hints:

  • If a def is utilized in only one cell, suggest turning it private.
  • If a def is not utilized, suggest removing it.

Additional opinionated tidy options

  • Suggestions to split or merge cells based on topology (e.g. 1 liners that are sequentially dependent or very large cells)
  • Suggestions to move cells with only code and no output to bottom of notebook
  • Suggestions to group imports into cells

This can be a wrapper/ preprocessor for a tool like ruff that can then check the notebook on a per cell level.

Alternative

Use tools like ruff as is, but miss out on marimo-specific clean-up

Additional context

No response

@dmadisetti
Copy link
Collaborator Author

dmadisetti commented Sep 14, 2024

I've been thinking about this a bit more. Notebooks are used to:

  1. show stuff
  2. or do stuff

lint should be able to determine if all code does either of these things.

It's easy to determine if code is used to "show stuff", as the code can be traced down to outputs.
Determining code that "does stuff" is a little harder, but can be made explicit.
It might be worth introducing the following api:

from marimo.save import side_effect

with side_effect("Pinging site"):
    requests.get("example.com")

lint can then determine whether code blocks feed into either an output, OR a side effect. Common side effects like "open" could be aliased, where:

with side_effect("write to file"):
    with open("file", "w") as f:
        f.write("x")

is flagged as not needed.


Side effect blocks can also be leveraged for native saving, in which case the block will be forced to run, even if the rest of the cell is cached.

@patrick-kidger
Copy link

Personally I'm -1 on this. I don't think Marimo should seek to reinvent every orthogonal piece of tooling -- this has always been one of the greatest weaknesses of Jupyter notebooks.

It should be possible to still achieve linting, type-checking etc. just by changing the serialize-as-.py format. For example, for the case of unused imports: just don't return them as outputs from the function. Marimo is already tracking downstream usages anyway, so this should be doable. Normal linters will then detect the unused import.

@dmadisetti
Copy link
Collaborator Author

Pruning returns would be easy to do, and is a great idea- certainly an elegant way of getting cells to conform to style.

I don't think having some organizational tool is a bad idea, but I think it needs to be notebook specific, and leave the python linting to the linters. Notebook specific suggestions might be:

  • detecting duplicate cells
  • detecting when state / ui changes may create unnecessary reruns
  • And maybe some hints like not using plt.show

The tooling for hints like these does not exist, and doesn't explicitly overlap with other style rules

Where there is alignment (e.g. not returning unused definitions, maybe leveraging top level imports/ consts), marimo should definitely lean into that.

@mscolnick
Copy link
Contributor

@patrick-kidger: For example, for the case of unused imports: just don't return them as outputs from the function.

I really like this idea, and maybe should be done by default.

@bossjones
Copy link

I started down this path a little bit cause i kept messing up my little POC project, my first time writing pylint plugins (ruff doesn't have support yet) so be nice :) not sure if I caught all of the use cases yet, but figured i'd share. Also great project!!! https://github.com/bossjones/prompt-library/blob/14f86f0ad9a6942e8f2f53b6afe9fa2a25ba0fb7/pylint/plugins/marimo_cell_params_validator.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants