Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notes when working through the lesson #1

Open
8 tasks
katilp opened this issue Dec 19, 2023 · 8 comments
Open
8 tasks

Notes when working through the lesson #1

katilp opened this issue Dec 19, 2023 · 8 comments

Comments

@katilp
Copy link

katilp commented Dec 19, 2023

Just taking notes of what I observe, most can be ignored but the 03 error needs action

  • remove "only" at the start page, in the setup and in 01-introduction in sentence "If you have already successfully installed and run the Docker example with python tools, then you need only execute the following command."
    • it sounds strange as it is followed by "additional prerequisites" so it is not only that

01-introduction

  • Instructions for installing libraries in setup are repeated in 01-introduction
  • as I followed the docker instructions and tried opening a notebook, I see now the following when I start jupyterlab
    image

02-analysislesson

  • This repeats the same intro that is already in 01-introduction. But I also see that 01-introduction is the part that is done separately by helpers before the actual lesson, and in that case, that's all fine. It may become a trouble if updates are needed
  • If I'm just reading through the lesson, here I launch jupyterlab again and I did it in 01 already (but OK if 01 is done separately before)
  • it might to good to downgrade the objectives of the lesson (unless the two "Learn about.." refer to the slides

03-coffea

  • it would be good to instruct to open a new notebook if that is intended, or to continue in the one that was opened in 02-analysislesson ("the output above" in the instructions makes me think that I should continue in the same notebook but I was not sure)
  • I get the following error for agc_events = NanoEventsFactory.from_root('root://eospublic.cern.ch//eos/opendata/cms/upload/od-workshop/ws2021/myoutput_odws2022-ttbaljets-prodv2.0_merged.root', schemaclass=AGCSchema, treepath='events').events()
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [18], in <cell line: 1>()
    ----> 1 agc_events = NanoEventsFactory.from_root('root://eospublic.cern.ch//eos/opendata/cms/upload/od-workshop/ws2021/myoutput_odws2022-ttbaljets-prodv2.0_merged.root', schemaclass=AGCSchema, treepath='events').events()
    
    File /usr/local/venv/lib/python3.10/site-packages/coffea/nanoevents/factory.py:289, in NanoEventsFactory.from_root(cls, file, treepath, entry_start, entry_stop, steps_per_file, runtime_cache, persistent_cache, schemaclass, metadata, uproot_options, access_log, iteritems_options, use_ak_forth, delayed)
        250 """Quickly build NanoEvents from a root file
        251 
        252 Parameters
       (...)
        283         Nanoevents will use dask as a backend to construct a delayed task graph representing your analysis.
        284 """
        286 if treepath is not uproot._util.unset and not isinstance(
        287     file, uproot.reading.ReadOnlyDirectory
        288 ):
    --> 289     raise ValueError(
        290         """Specification of treename by argument to from_root is no longer supported in coffea 2023.
        291     Please use one of the allowed types for "files" specified by uproot: https://github.com/scikit-hep/uproot5/blob/v5.1.2/src/uproot/_dask.py#L109-L132
        292     """
        293     )
        295 if delayed and steps_per_file is not uproot._util.unset:
        296     warnings.warn(
        297         f"""You have set steps_per_file to {steps_per_file}, this should only be used for a
        298         small number of inputs (e.g. for early-stage/exploratory analysis) since it does not
       (...)
        304         RuntimeWarning,
        305     )
    
    ValueError: Specification of treename by argument to from_root is no longer supported in coffea 2023.
                Please use one of the allowed types for "files" specified by uproot: https://github.com/scikit-hep/uproot5/blob/v5.1.2/src/uproot/_dask.py#L109-L132
    
    Note: Specification of treename by argument to from_root is no longer supported in coffea 2023.
@jmhogan
Copy link
Collaborator

jmhogan commented Dec 19, 2023

@mattbellis Hooking you in here -- "treename" argument doesn't work anymore in "from_root".

@katilp
Copy link
Author

katilp commented Dec 19, 2023

@alexander-held @oshadura Can you help with the last point? What should be changed in
agc_events = NanoEventsFactory.from_root(... for coffea 2023?
I tried a few things based on what I've seen around but did not find the right syntax...
Thanks!

@alexander-held
Copy link

Hi, something like the following should work:

agc_events = NanoEventsFactory.from_root({'root://eospublic.cern.ch//eos/opendata/cms/upload/od-workshop/ws2021/myoutput_odws2022-ttbaljets-prodv2.0_merged.root': 'events'}, schemaclass=AGCSchema).events()

The tree name moves and becomes part of the file name in this dictionary structure {filename: tree}.

You might need other changes in addition to this depending on what you want to run subsequently. The latest version of coffea introduces a large amount of conceptual and interface changes. An overview can be found at scikit-hep/coffea#775. We also intend to update our AGC notebook to this latest version of coffea, which hopefully will happen at the beginning of next year. This is being tracked in the (rather cryptic looking) issue iris-hep/analysis-grand-challenge#116, with some early prototypes linked.

@katilp
Copy link
Author

katilp commented Dec 19, 2023

Thanks @alexander-held

That gives me a different error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 agc_events = NanoEventsFactory.from_root({'root://eospublic.cern.ch//eos/opendata/cms/upload/od-workshop/ws2021/myoutput_odws2022-ttbaljets-prodv2.0_merged.root': 'events'}, schemaclass=AGCSchema).events()

File /usr/local/venv/lib/python3.10/site-packages/coffea/nanoevents/factory.py:314, in NanoEventsFactory.from_root(cls, file, treepath, entry_start, entry_stop, steps_per_file, runtime_cache, persistent_cache, schemaclass, metadata, uproot_options, access_log, iteritems_options, use_ak_forth, delayed)
    296     warnings.warn(
    297         f"""You have set steps_per_file to {steps_per_file}, this should only be used for a
    298         small number of inputs (e.g. for early-stage/exploratory analysis) since it does not
   (...)
    304         RuntimeWarning,
    305     )
    307 if (
    308     delayed
    309     and not isinstance(schemaclass, FunctionType)
    310     and schemaclass.__dask_capable__
    311 ):
    312     map_schema = _map_schema_uproot(
    313         schemaclass=schemaclass,
--> 314         behavior=dict(schemaclass.behavior()),
    315         metadata=metadata,
    316         version="latest",
    317     )
    319     to_open = file
    320     if isinstance(file, uproot.reading.ReadOnlyDirectory):

TypeError: 'property' object is not callable

If many other changes are needed, we could eventually stick to an older version of coffea although it would be nice to have it all updated. These are lessons for a workshop at the beginning of January, probably before the AGC updates.

@alexander-held
Copy link

It sounds plausible to me that this custom schema would need to be updated for coffea 2023, but I am no expert in that regard. I'll add @lgray to this thread, who might know. For reference, an example of this schema is here: link to code.

I should note that this schema is used in conjuncture with the custom ntuple files we derived from the public MiniAODs before we created NanoAODs. In general I would recommend updating the setup to instead use the NanoAOD version, which also means you can use the coffea-provided NanoAOD schema (and that works fine).

The changes from ntuple to NanoAOD setup came in the AGC repo via tag 1.0.0 and more specifically mostly via this PR: iris-hep/analysis-grand-challenge#102. We now have two branches in our AGC repository: main includes additional developments (e.g. a machine learning component), while the agc-v1 branch is very close in functionality to the previous ntuple-based setup (but instead uses NanoAODs).

The updates to go from the ntuple-based to the NanoAOD-based setup should not be too large. That however still will require additional work to also play nicely with coffea 2023 and I can say from our side that we won't be able to provide an updated version in time for a workshop at the beginning of January.

@mattbellis
Copy link
Collaborator

@katilp I just tried this but in the "Jupyter and coffea setup", instead of

pip install vector hist mplhep coffea cabinetry

I installed the older version of coffea

pip install vector hist mplhep coffea==0.7 cabinetry

and I was at least able to open the file. I have to run out so I don't have time to check the whole thing, but given that coffea 2023 just came out, I wonder if we should just use the older version for this workshop?

@katilp
Copy link
Author

katilp commented Dec 20, 2023

@mattbellis Yes, I agree, we can use the older version

@lgray
Copy link

lgray commented Dec 31, 2023

FYI This is a very easy fix. There was an interface change in BaseSchema in coffea 2023.

All you need to do make AGCSchema.behavior not a @property and this will function.

This was required due to the rather different way the new nanoevents backend works.

Please open issues on the coffea github for stuff like this in the future and I'll attend to it more promptly!

It is very likely we can make coffea 2023 function for your workshop, but as in the announcement for analysis users if you're trying to do anything where you need correct results now sticking with 0.7 is probably a better way to go. There's always another workshop. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants