Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSCode debugger support: 'debug cell' #1325

Open
rfejgin opened this issue May 6, 2024 · 18 comments
Open

VSCode debugger support: 'debug cell' #1325

rfejgin opened this issue May 6, 2024 · 18 comments
Assignees

Comments

@rfejgin
Copy link

rfejgin commented May 6, 2024

Description

I am using marimo in VSCode using the extension. With Jupyter Notebooks, I have an option to 'debug' a cell rather than just run it. This will hit VSCode breakpoints in the called code. I like this a lot since lets me fluidly mix between notebook-style execution and IDE-style debugging.

Does that way of working fit the marimo model at all? Or is the idea that if I want to debug I run the whole file directly from the VSCode debugger (without marimo)? If it's the former, it would be nice to have a 'debug cell' option.

Suggested solution

Implement a 'debug cell' option

Alternative

Document the inteded approach to interactive debugging when one needs to examine code called by a marimo notebook. Searching the docs for 'debug' didn't come up with anything.

Additional context

No response

@rfejgin rfejgin changed the title VSCode debugger support: 'debug 'cell' VSCode debugger support: 'debug cell' May 6, 2024
@mscolnick
Copy link
Contributor

Can you use import pdb; pdb.set_trace() within the cell?

@rfejgin
Copy link
Author

rfejgin commented May 6, 2024

Yes. But this breaks into the pdb interface, not the VSCode visual debugger (where you can e.g. click to set a breakpoint). For folks used to using the IDE, switching to pdb is a pretty different way of working (e.g. examining variables also would need to be done via pdb).

Within the notebook itself using pdb isn't too bad, but my use case is one where the notebook calls other code - like my model implementation - and it's in that implementation where I want to set breakpoints (visually, ideally).

@mscolnick
Copy link
Contributor

Got it, yea, the vscode debugger is much better than basic pdb.

It's not super trivial, but I think we would need to be able to run marimo with a debug/inspect flag that will launch debugpy in order to communicate with vscode.

I'm not sure when we will get to i - maybe a contributor can pick it up, or if there are enough 👍 we can try to prioritize it sooner.

@rfejgin
Copy link
Author

rfejgin commented May 6, 2024

Cool, thanks for considering!

@rfejgin
Copy link
Author

rfejgin commented May 6, 2024

By the way: I tried what I thought might be the alternative approach, which was to directly launch the script (notebook) from VSCode. But it seems that marimo creates a copy of the script somehwere in /tmp so breakpoints set in VSCode in the original script do not get hit.

@mscolnick
Copy link
Contributor

Are you running it as a script (python notebook.py) or as an app (marimo edit notebook.py)? Either way, it's possible that vscode installs debugpy for you and maybe there is a way for us to declare that mapping.

@rfejgin
Copy link
Author

rfejgin commented May 6, 2024

I'm running as a script, using VSCode's Run->Start Debugging. From examining the command line that generates, it does indeed appear to be calling debugpy.

@mscolnick
Copy link
Contributor

Got it:

  1. I wonder if we can run our scripts without copying files to /tmp (or at least an option to)
  2. In the meantime, you could try marimo export notebook.py -o notebook.script.py to export the file to a flat script without marimo's cell decorators, which could be helpful while debugging. marimo export also supports --watch

@akshayka
Copy link
Contributor

akshayka commented May 7, 2024

I wonder if we can run our scripts without copying files to /tmp (or at least an option to)

Using python notebook.py doesn't copy anything to /tmp. I wonder if VSCode is doing that.

@rfejgin
Copy link
Author

rfejgin commented May 7, 2024

Using python notebook.py doesn't copy anything to /tmp. I wonder if VSCode is doing that.

Hmm, maybe it's not being copied after all. It's just that when I hit an error (e.g. if I imported something that can't be found), the exception says that it's in a file called e.g. /tmp/marimo_2545903/__marimo__cell_Hbol_.py. But when I examine /tmp I don't see that file.

Regardless, I can't seem to get normal VSCode breakpoints to get hit when running the script. I thought marimo was copying the file to /tmp but I guess not? In any case, I see marimo calling exec on the cell (in marimo._ast.cell.py.execute_cell()), maybe that's confusing the debugger?

By the way, this script comes from a Jupyter Notebook which I then converted to a script using marimo convert. I then run the resulting script in the VSCode debugger as I would for any script.

@akshayka
Copy link
Contributor

akshayka commented May 8, 2024

Oh interesting, thanks for that context. Each cell is compiled and given a unique filename (which happens to be under tmp). Maybe that confuses vscode, or maybe it's the exec like you say.

It's surprising because this works at the command line (insert a breakpoint in your file with pdb.set_trace(), then run with python nb.py

@sorig
Copy link

sorig commented May 13, 2024

+1 for interactive debugging support (not just pdb text interface). This is the main feature that's blocking me from moving our team from jupyter to marimo.

@githubpsyche
Copy link

githubpsyche commented May 13, 2024

Here's an approach for debugging inside VSCode I've found. The gist of the strategy is to separate the decoration of cells with @app.cell from the declaration of functions that use them.

import marimo

__generated_with = "0.4.12"
app = marimo.App()

def defines_x():
    x = 2
    return x
 
def defines_y(x):
    y = x + 1
    y
    return y
 
def computes_z(x, y):
    z = x + y + 1
    z
    return z

def test_computes_z():
    assert computes_z(5, 3) == 9

app.cell(computes_z)
app.cell(defines_y)
app.cell(defines_x)

if __name__ == "__main__":
    test_computes_z()
    app.run()

Since you still call app.cell(computes_z), the cell will work normally when you use marimo run or execute the Python file in your terminal.

But at the same time, with this design, functions like computes_z can be used like any other function, and separately from marimo. You can set breakpoints inside of it and debug via test_computes_z either with VSCode's Tests extension or by using VSCode's Python Debugger (in command palette, you can start typing "Debug Python" to find this).

You can even treat the notebook as a module, importing specific functions like you'd import Python functions normally OR using the recipe described in the docs's Cell API reference, which seems to take advantage of the DAG specified in your notebook in a way that the base function could not.


Downsides:

  • While you can open the notebook in editor mode, once you save the notebook from the editor UI, it'll be back to its usual format.
  • You don't get to debug the DAG specified by your notebook this way. Each cell function is treated as relatively standalone. You'll have to pass arguments explicitly, even if you retrieve them from other cells.
  • If you've allowed other global variables in your script file (e.g., a global numpy importthat you then use inside a cell), you might get different outputs frommarimo run` than in your test code.

I think this approach would work a lot more smoothly if marimo already separated calls to app.cell from function specification by default and save marimo notebooks this way when changes are made in the web UI's editor mode. I don't think this would hinder the script file's readability much at all, even as it would give notebooks more features!

With these changes, uers would be defining ordinary and pure Pythons functions as a side effect of notebook development inside marimo. They could then re-use these functions anywhere they'd like as if they weren't even specified inside notebooks in the first place. And at the same time, they'd still maintain access to the Cell instantiation of these functions -- either inside or outside the notebook.

Still, being able to debug within the DAG context would be better, and not be addressed by these changes.

@rfejgin
Copy link
Author

rfejgin commented May 13, 2024

+1 for interactive debugging support (not just pdb text interface). This is the main feature that's blocking me from moving our team from jupyter to marimo.

Same here - this is the main thing stopping me from switching to marimo.

@akshayka akshayka self-assigned this May 14, 2024
@akshayka
Copy link
Contributor

Thanks everyone for the thoughtful feedback.

I believe if we just ran the cells as functions, instead of exec-ing them, the debugger would work (as @githubpsyche suggests).

But app.run() also returns the visual outputs (last expression of each cell), which is why we exec/eval them.

We don't have a solution yet ... but just wanted to acknowledge that we hear you and hope to find one.

@alefminus
Copy link

alefminus commented May 16, 2024

Am I mistaken that a possible solution would be to return a tuple of the last expression and the return value, discarding the last expression when used as a notebook (via marimo module machinery) and using it as usual. Or am I missing something here?

Edit: I naively assumed you produce a python file anyway and instead of execing the code for a cell we could call the function (dynamically, per the DAG), but it seems that does not happen.

@githubpsyche
Copy link

githubpsyche commented Jul 7, 2024

https://code.visualstudio.com/docs/python/debugging#_local-script-debugging

Seems to provide some clue for getting this to work.

You add a remote attach configuration to your .launch.json (VSCode helps set this up with an "Add Configuration" button at bottom left of the file opened into editor):

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python Debugger: Remote Attach",
            "type": "debugpy",
            "request": "attach",
            "connect": {
                "host": "localhost",
                "port": 5678
            },
            "pathMappings": [
                {
                    "localRoot": "${workspaceFolder}",
                    "remoteRoot": "."
                }
            ]
        }
    ]
}

Then this is the gist of the pattern you use to start a debugging session:

import debugpy

# 5678 is the default attach port in the VS Code debug configurations. Unless a host and port are specified, host defaults to 127.0.0.1
debugpy.listen(5678)
print("Waiting for debugger attach")
debugpy.wait_for_client()
breakpoint()
print('break on this line')

I've found that if I put the debugging setup in its own cell (everything before breakpoint(), then calling breakpoint does configure a debugging session in the marimo cell's context in the way it's supposed to, without the hack I suggested earlier in this thread.

Maybe significantly, this works even if one is primarily developing with marimo's native UI, since it works over a (local) network connection.

Full example of a script with a breakpoint in it:

import marimo

__generated_with = "0.7.0"
app = marimo.App()


@app.cell
def __():
    import debugpy; debugpy.listen(5678); debugpy.wait_for_client()
    return debugpy,


@app.cell
def computes_z(x, y):
    z = x + y + 1
    z
    return z,


@app.cell
def defines_y(x):
    y = x + 1
    breakpoint()
    y
    return y,


@app.cell
def defines_x():
    x = 2
    return x,


if __name__ == "__main__":
    app.run()

Would be nice to find some way to smooth this workflow out further.

@mscolnick
Copy link
Contributor

@githubpsyche - I really appreciate the detailed write-up.

It would be great to get this workflow smoother. are there any obvious things we could do?

for example:

I am not too familiar with debugpy or the vscode's debugpy integration so would appreciate any help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants