generated from cms-opendata-workshop/workbench-template-md
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit f981063
Showing
20 changed files
with
2,101 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
--- | ||
title: "Introduction" | ||
teaching: 10 | ||
exercises: 5 | ||
--- | ||
|
||
:::::::::::::::::::::::::::::::::::::: questions | ||
|
||
- What have we learned in the pre-leaning lessons and how can we apply it? | ||
- Where do we find information about physics objects in the CMS NanoAOD format? | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: objectives | ||
|
||
- Apply what we have learned in the pre-learning lessons about CMS physics objects | ||
- Learn about the documentation of the NanoAOD format | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
## Dataformats in CMS | ||
|
||
Most previous releases of CMS open data have been in the Analysis Object Data (AOD) format. | ||
This is a complex format and specific CMS software (CMSSW) is required in order to read and analyze it. | ||
|
||
From 2015 data releases have been a slimmed-down format called MiniAOD, which has the same essential structure and software requirements for analysis as AOD. Essentially there are fewer | ||
physics object collections and often the physics objects themselves are different. | ||
|
||
For data released in 2016 and beyond a new format called NanoAOD is used. NanoAOD is not just simply slimmed-down MiniAOD. In contrast to AOD and MiniAOD which is stored in CMSSW C++ objects, NanoAOD is stored using ROOT TTree objects. You therefore do not need to use the CMS Virtual Machine or docker container to analyze NanoAOD data. NanoAOD can be analyzed using the ROOT program and/or python libraries capable of interpreting the ROOT's TTree structure. | ||
|
||
In this workshop we will focus on working with open data in the NanoAOD format. | ||
|
||
## Physics objects in CMS data | ||
|
||
The recommended [CMS Physics Objects prelearning lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html) guides you through different physics objects and explains what information is available for them in the CMS NanoAOD format. | ||
|
||
Let us now make sure that you can find that information. | ||
|
||
::::::::::::::::::::: challenge | ||
|
||
## Exercise 1: Find the NanoAOD variable description for a physics object | ||
|
||
Select a physics object of your choice in the [CMS Physics Objects lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html) and find the corresponding variable listing from a CMS dataset record on the [CERN Open Data portal](https://opendata.cern.ch/). | ||
|
||
:::::::::::::: solution | ||
|
||
Find the NanoAOD variable listing for example for the [SingleElectron collision dataset from 2016 RunG](https://opendata.cern.ch/record/30529). Scroll down to "Dataset semantics" and open the [variable list](https://opendata.cern.ch/eos/opendata/cms/dataset-semantics/NanoAOD/30529/SingleElectron_doc.html). | ||
|
||
Find the links to the physics object collections under "Events Content" and find the object of your choice. Read the object descriptions provided in the [CMS Physics Objects pre-learning lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html). | ||
|
||
:::::::::::::: | ||
|
||
:::::::::::::::::::: | ||
|
||
::::::::::::::::::::: challenge | ||
|
||
## Exercise 2: Compare variable lists in different collision datasets. | ||
|
||
Find all collision datasets from 2016 in NanoAOD format. Compare the variable list. Do Muon datasets contain an electron collection? Do Electron datasets contain a muon collection? | ||
|
||
:::::::::::::: solution | ||
|
||
Use the search facets of the [search page](https://opendata.cern.ch/search?q=&l=list&order=desc&p=1&s=10&sort=mostrecent). | ||
|
||
Select **Collision** under Dataset, **CMS** under Experiment, **2016** under "Year", and **nanoaod** under File type. | ||
|
||
Open two different collision datasets and check their variable lists. | ||
|
||
:::::::::::::: | ||
|
||
:::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: keypoints | ||
|
||
- The variable list with a variable brief description is linked to all CMS NanoAOD datasets. | ||
- CMS Physics Objects pre-learning lesson describes different physics object variables in more detail. | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
--- | ||
title: "NanoAOD datasets" | ||
teaching: 10 | ||
exercises: 0 | ||
--- | ||
|
||
:::::::::::::::::::::::::::::::::::::: questions | ||
|
||
- How do we find a specific nanoAOD dataset? | ||
- How to we explore the content of our nanoAOD dataset? | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: objectives | ||
|
||
- Know how to find nanoAOD datasets | ||
- Know how to explore the content of nanoAOD | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
## Find and explore a nanoAOD dataset | ||
|
||
Let's find and explore a particular which we will get even further into | ||
later: simulated Z' events in which the Z' decays to a top and antitop quark pair. | ||
|
||
:::::: callout | ||
A Z' ("Z-prime") is a hypothetical heavy gauge boson that could come from | ||
extensions of the Standard Model. A review of searches for the Z' | ||
can be found [here](https://pdg.lbl.gov/2024/reviews/rpp2024-rev-zprime-searches.pdf) | ||
:::::: | ||
|
||
### Find the dataset | ||
|
||
All data can be found via the [CERN Open Data Portal](https://opendata.cern.ch). | ||
Let's go to the website and search the simulated Z' datasets. | ||
|
||
Dataset naming in CMS can seem obscure but let's do something simple and search for "Zprime*": | ||
|
||
![](fig/ZprimeODP.png){alt='Search for Zprime* at the CODP'} | ||
|
||
The query results are [here](https://opendata.cern.ch/search?q=Zprime%2A&l=list&order=asc&p=1&s=10&sort=bestmatch) and you can see that there are many (over 1000) records returned: | ||
|
||
![](fig/ZprimeODP-results.png){alt='Search results for Zprime*'} | ||
|
||
Let's narrow down the results and select "Type: Dataset", "Experiment: CMS", "Year: 2016", "File type: nanoaodsim", and "Category: Heavy Gauge Bosons". We've now reduced the number of [matches](https://opendata.cern.ch/search?q=Zprime%2A&f=experiment%3ACMS&f=year%3A2016&f=file_type%3Ananoaodsim&f=category%3AExotica%2Bsubcategory%3AHeavy%20Gauge%20Bosons&f=type%3ADataset&l=list&order=asc&p=1&s=10&sort=bestmatch) from over 1000 down to 210: | ||
|
||
![](fig/ZprimeODP-results2.png){alt='Narrowed search results for Zprime*'} | ||
|
||
We can discern some of the logic behind the simulated dataset naming. "Zprime" is the particle produced and it decays to various products. We want $Z^{'} \rightarrow t\bar{t}$ which shows up as the third result so let's [narrow the search](https://opendata.cern.ch/search?q=ZprimeToTT%2A&f=experiment%3ACMS&f=year%3A2016&f=file_type%3Ananoaodsim&f=category%3AExotica%2Bsubcategory%3AHeavy%20Gauge%20Bosons&f=type%3ADataset&l=list&order=asc&p=1&s=10&sort=bestmatch) further and search with "ZprimeToTT*": | ||
|
||
![](fig/ZprimeToTT-results.png){alt='Narrowed search results for Zprime*'} | ||
|
||
We can also discern that the dataset names also include the mass (in GeV) of the hypothetical Z' (e.g. "_M2000"). | ||
|
||
TO-DO: what do the other strings mean in the dataset name? | ||
|
||
Possible challenge: have them select a mass and search for the dataset and select a file for the next part. | ||
|
||
Next, let's use the `cernopendata-client` command-line tool to find the datasets | ||
and fetch a file. | ||
|
||
### Explore a file | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor | ||
|
||
Inline instructor notes can help inform instructors of timing challenges | ||
associated with the lessons. They appear in the "Instructor View" | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: keypoints | ||
|
||
- Use `.md` files for episodes when you want static content | ||
- Use `.Rmd` files for episodes when you need to generate output | ||
- Run `sandpaper::check_lesson()` to identify any issues with your lesson | ||
- Run `sandpaper::build_lesson()` to preview your lesson locally | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
[r-markdown]: https://rmarkdown.rstudio.com/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
--- | ||
title: "NanoAOD exercises" | ||
teaching: 10 | ||
exercises: 0 | ||
--- | ||
|
||
:::::::::::::::::::::::::::::::::::::: questions | ||
|
||
- What have we learned in the pre-exercises and how can we apply it? | ||
- What is the structure and content of the nanoAOD format? | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: objectives | ||
|
||
- Apply what we have learned in the pre-exercises | ||
- Learn about the structure and content of nanoAOD | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
## Exercises with NanoAOD | ||
|
||
::::::::::::::::::::::::::::::::::::: challenge | ||
|
||
## Exercise 1: Get data file locations | ||
|
||
Let's select a ZprimeToTT sample for a given mass and using | ||
the `cernopendata-client` to get the associated data files | ||
|
||
Recall what you've learned from the [pre-exercise](https://cms-opendata-workshop.github.io/workshop2024-lesson-dataset-scouting/instructor/04-cli-through-cernopendata-client.html) on the `cernopendata-client`. | ||
|
||
:::::::::::::::::::::::: solution | ||
|
||
## Solution | ||
|
||
Search for the ZprimeToTT samples in the CERN Open Data Portal. The resulting query is [here](https://opendata.cern.ch/search?q=ZprimeToTT%2A&f=experiment%3ACMS&f=year%3A2016&f=file_type%3Ananoaodsim&f=category%3AExotica%2Bsubcategory%3AHeavy%20Gauge%20Bosons&f=type%3ADataset&l=list&order=asc&p=1&s=10&sort=bestmatch). | ||
|
||
Next, select a dataset. Here we fetch [this one](https://opendata.cern.ch/record/75124), record 75124, "Simulated dataset ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8 in NANOAODSIM format for 2016 collision data" where the Z' mass is 1000 GeV. | ||
|
||
Fetch the Docker image for the `cernopendata-client`: | ||
|
||
```bash | ||
docker pull docker.io/cernopendata/cernopendata-client | ||
``` | ||
|
||
and refresh your memory on the commands: | ||
```bash | ||
docker run -i -t --rm docker.io/cernopendata/cernopendata-client --help | ||
``` | ||
```output | ||
Usage: cernopendata-client [OPTIONS] COMMAND [ARGS]... | ||
Command-line client for interacting with CERN Open Data portal. | ||
Options: | ||
--help Show this message and exit. | ||
Commands: | ||
download-files Download data files belonging to a record. | ||
get-file-locations Get a list of data file locations of a record. | ||
get-metadata Get metadata content of a record. | ||
list-directory List contents of a EOSPUBLIC Open Data directory. | ||
verify-files Verify downloaded data file integrity. | ||
version Return cernopendata-client version. | ||
``` | ||
|
||
Then fetch the files for record 75124: | ||
```python | ||
cernopendata-client get-file-locations --recid 75124 | ||
``` | ||
|
||
```output | ||
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/2520000/65A0736B-22F3-C94C-99AE-36717B28629C.root | ||
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/2520000/6E508763-A12F-8846-A295-F39EE7DDAA52.root | ||
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/2530000/7B2D5CD5-9CAE-C046-A9AB-50CE9D48B187.root | ||
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/260000/1A50245D-8213-6340-8EA0-CB064EEC6AF3.root | ||
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/09FA6C37-21D6-7846-B3E1-F8086CBA0E9E.root | ||
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/3AAB5B1E-7169-9C4D-841C-CB2D6E40CBAE.root | ||
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/820C3EBC-0E1D-CE41-9418-FA1615123FC2.root | ||
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/CF54D079-349C-FB4F-B6E5-3D579D89EDE4.root | ||
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/80000/E964C281-43FB-D349-A436-9A3FDA0BAA28.root | ||
``` | ||
::::::::::::::::::::::::::::::::: | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::: challenge | ||
|
||
## Exercise 2: Inspect the data file | ||
|
||
:::::::::::::: solution | ||
|
||
|
||
:::::::::::::: | ||
|
||
:::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: keypoints | ||
|
||
- Use `.md` files for episodes when you want static content | ||
- Use `.Rmd` files for episodes when you need to generate output | ||
- Run `sandpaper::check_lesson()` to identify any issues with your lesson | ||
- Run `sandpaper::build_lesson()` to preview your lesson locally | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
[r-markdown]: https://rmarkdown.rstudio.com/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
title: "Contributor Code of Conduct" | ||
--- | ||
|
||
As contributors and maintainers of this project, | ||
we pledge to follow the [The Carpentries Code of Conduct][coc]. | ||
|
||
All workshop participants are also expected to follow the [CERN Code of Conduct](https://cds.cern.ch/record/2240689/files/BrochureCodeofConductEN.pdf?version=1) | ||
|
||
Instances of abusive, harassing, or otherwise unacceptable behavior | ||
may be reported by following our [reporting guidelines][coc-reporting]. | ||
|
||
|
||
[coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html | ||
[coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html |
Oops, something went wrong.