Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 'evaluateName' for dataframe columns #1672

Open
pwang347 opened this issue Sep 18, 2024 · 5 comments
Open

Support 'evaluateName' for dataframe columns #1672

pwang347 opened this issue Sep 18, 2024 · 5 comments
Assignees
Labels
needs repro Issue has not been reproduced yet

Comments

@pwang347
Copy link
Member

Hi there!

Our extension leverages the evaluateName DAP property to allow users to view and interact with nested variables through evaluate expressions while debugging. This works well when accessing top-level and nested data for simple lists, dictionary and class objects, but I've noticed some cases where this property is missing, e.g., the following:

Is there more general information/documentation on what scenarios this property is or isn't supported (due to technical limitations or other reasons?)

I did also see #1439, but it wasn't clear to me if these were necessarily related to support for the evaluateName property.

Thanks!

@github-actions github-actions bot added the needs repro Issue has not been reproduced yet label Sep 18, 2024
@rchiodo
Copy link
Contributor

rchiodo commented Sep 18, 2024

I don't believe there's any documentation on limitations with eval, but you can log the output of the debugger and add logging around eval to debug it yourself. Eval basically happens here: https://github.com/fabioz/PyDev.Debugger/blob/main/_pydevd_bundle/pydevd_safe_repr.py

See this documentation on adding logging:
https://github.com/microsoft/debugpy/wiki/Enable-debugger-logs

What does it do when you eval a dataframe column?

@pwang347
Copy link
Member Author

pwang347 commented Sep 18, 2024

Hi @rchiodo, thanks for the quick response.

Just to be clear, there is no problem with running an evaluate request on the column.
Here's the output of running d.Age in the terminal, which evaluates to what I would expect:

1.16s - Process EvaluateRequest: {
    "arguments": {
        "context": "repl",
        "expression": "d.Age",
        "format": {},
        "frameId": 2
    },
    "command": "evaluate",
    "seq": 38,
    "type": "request"
}

0.00s - processing internal command: InternalThreadCommands(<function internal_evaluate_expression_json at 0x107964f40>, (<_pydevd_bundle._debug_adapter.pydevd_schema.EvaluateRequest object at 0x118135a90>, 'pid_83663_id_4377147344'), {})
0.00s - sending cmd (http_json) -->           CMD_RETURN {"type": "response", "request_seq": 38, "success": true, "command": "evaluate", "body": {"result": "0      22.0\n1      38.0\n2      26.0\n3      35.0\n4      35.0\n       ... \n886    27.0\n887    19.0\n888     NaN\n889    26.0\n890    32.0\nName: Age, Length: 891, dtype: float64", "variablesReference": 9, "type": "Series", "presentationHint": {}}, "seq": 88, "pydevd_cmd_id": 502}

The problem I was referring to is this:

0.00s - Process VariablesRequest: {
    "arguments": {
        "format": {},
        "variablesReference": 6
    },
    "command": "variables",
    "seq": 41,
    "type": "request"
}

0.01s - processing internal command: InternalThreadCommands(<function internal_get_variable_json at 0x1079645e0>, (<_pydevd_bundle._debug_adapter.pydevd_schema.VariablesRequest object at 0x118135a90>,), {})
0.01s - sending cmd (http_json) -->           CMD_RETURN {"type": "response", "request_seq": 41, "success": true, "command": "variables", "body": {"variables": [...{"name": "Age", "value": "0      22.0\n1      38.0\n2      26.0\n3      35.0\n4      35.0\n       ... \n886    27.0\n887    19.0\n888     NaN\n889    26.0\n890    32.0\nName: Age, Length: 891, dtype: float64", "type": "Series", "variablesReference": 9}, ...

It seems that the resulting Variable object sometimes has an evaluateName property and sometimes not. In fact, this property is marked as optional in the DAP specification as well (I couldn't find in the spec or repository why this property is optional, aside from the assumed fact that there may be technical limitations):
image

Compare with the top-level d variable which does have this property:
{"name": "d", "value": "...", "type": "DataFrame", "evaluateName": "d", "variablesReference": 6}

This property is important to us from an extension perspective since we don't know what code the user ran, but would like to be able to recreate the variable using an expression.

@rchiodo
Copy link
Contributor

rchiodo commented Sep 18, 2024

So you were hoping evaluateName would list d.Age? I don't think the debugger is smart enough to figure that out at the moment. I believe that's what these resolvers do:

https://github.com/microsoft/debugpy/blob/main/src/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_resolver.py

If you added a DataFrame one there, it might be able to resolve the column? Might have to be more specialized than that.

@pwang347
Copy link
Member Author

pwang347 commented Sep 18, 2024

Got it, thanks for the clarification and the link! I am mostly curious though why the default resolver seems to apply generally for Python objects, but not for Pandas DataFrames specifically (and the other scenarios I mentioned in the original post). For example, the following works (ie., we see evaluateName being shown for d.df and d.series, and d.test):

import pandas as pd
class Foo():
    df = pd.read_csv("http://raw.githubusercontent.com/pwang347/cdn/master/titanic.csv")
    series = df.Age

    @property
    def test(self):
        df = pd.read_csv("http://raw.githubusercontent.com/pwang347/cdn/master/titanic.csv")
        return df.Age
d = Foo()
d # <- breakpoint

Does this issue belong in https://github.com/fabioz/PyDev.Debugger instead? And is the recommendation to look into making a contribution there after discussing more with the owner @fabioz?

@rchiodo
Copy link
Contributor

rchiodo commented Sep 18, 2024

You can make the contribution in either place. Debugpy is what ships with VS code though, so any changes in PyDev.Debugger have to flow back to debugpy first.

My guess why df.Age isn't working is because it's dynamic. It's not in the dictionary for the attributes for a dataframe.

@rchiodo rchiodo changed the title Improve documentation and support for DAP's evaluateName property Support 'evaluateName' for dataframe columns Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs repro Issue has not been reproduced yet
Projects
None yet
Development

No branches or pull requests

3 participants