Skip to content

Commit

Permalink
source commit: bb3b2d8
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Jul 24, 2024
0 parents commit f981063
Show file tree
Hide file tree
Showing 20 changed files with 2,101 additions and 0 deletions.
79 changes: 79 additions & 0 deletions 01-introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: "Introduction"
teaching: 10
exercises: 5
---

:::::::::::::::::::::::::::::::::::::: questions

- What have we learned in the pre-leaning lessons and how can we apply it?
- Where do we find information about physics objects in the CMS NanoAOD format?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Apply what we have learned in the pre-learning lessons about CMS physics objects
- Learn about the documentation of the NanoAOD format

::::::::::::::::::::::::::::::::::::::::::::::::

## Dataformats in CMS

Most previous releases of CMS open data have been in the Analysis Object Data (AOD) format.
This is a complex format and specific CMS software (CMSSW) is required in order to read and analyze it.

From 2015 data releases have been a slimmed-down format called MiniAOD, which has the same essential structure and software requirements for analysis as AOD. Essentially there are fewer
physics object collections and often the physics objects themselves are different.

For data released in 2016 and beyond a new format called NanoAOD is used. NanoAOD is not just simply slimmed-down MiniAOD. In contrast to AOD and MiniAOD which is stored in CMSSW C++ objects, NanoAOD is stored using ROOT TTree objects. You therefore do not need to use the CMS Virtual Machine or docker container to analyze NanoAOD data. NanoAOD can be analyzed using the ROOT program and/or python libraries capable of interpreting the ROOT's TTree structure.

In this workshop we will focus on working with open data in the NanoAOD format.

## Physics objects in CMS data

The recommended [CMS Physics Objects prelearning lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html) guides you through different physics objects and explains what information is available for them in the CMS NanoAOD format.

Let us now make sure that you can find that information.

::::::::::::::::::::: challenge

## Exercise 1: Find the NanoAOD variable description for a physics object

Select a physics object of your choice in the [CMS Physics Objects lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html) and find the corresponding variable listing from a CMS dataset record on the [CERN Open Data portal](https://opendata.cern.ch/).

:::::::::::::: solution

Find the NanoAOD variable listing for example for the [SingleElectron collision dataset from 2016 RunG](https://opendata.cern.ch/record/30529). Scroll down to "Dataset semantics" and open the [variable list](https://opendata.cern.ch/eos/opendata/cms/dataset-semantics/NanoAOD/30529/SingleElectron_doc.html).

Find the links to the physics object collections under "Events Content" and find the object of your choice. Read the object descriptions provided in the [CMS Physics Objects pre-learning lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html).

::::::::::::::

::::::::::::::::::::

::::::::::::::::::::: challenge

## Exercise 2: Compare variable lists in different collision datasets.

Find all collision datasets from 2016 in NanoAOD format. Compare the variable list. Do Muon datasets contain an electron collection? Do Electron datasets contain a muon collection?

:::::::::::::: solution

Use the search facets of the [search page](https://opendata.cern.ch/search?q=&l=list&order=desc&p=1&s=10&sort=mostrecent).

Select **Collision** under Dataset, **CMS** under Experiment, **2016** under "Year", and **nanoaod** under File type.

Open two different collision datasets and check their variable lists.

::::::::::::::

::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: keypoints

- The variable list with a variable brief description is linked to all CMS NanoAOD datasets.
- CMS Physics Objects pre-learning lesson describes different physics object variables in more detail.

::::::::::::::::::::::::::::::::::::::::::::::::

261 changes: 261 additions & 0 deletions 02-nanoaod-miniaod.md

Large diffs are not rendered by default.

80 changes: 80 additions & 0 deletions 03-nanoaod-dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
title: "NanoAOD datasets"
teaching: 10
exercises: 0
---

:::::::::::::::::::::::::::::::::::::: questions

- How do we find a specific nanoAOD dataset?
- How to we explore the content of our nanoAOD dataset?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Know how to find nanoAOD datasets
- Know how to explore the content of nanoAOD

::::::::::::::::::::::::::::::::::::::::::::::::

## Find and explore a nanoAOD dataset

Let's find and explore a particular which we will get even further into
later: simulated Z' events in which the Z' decays to a top and antitop quark pair.

:::::: callout
A Z' ("Z-prime") is a hypothetical heavy gauge boson that could come from
extensions of the Standard Model. A review of searches for the Z'
can be found [here](https://pdg.lbl.gov/2024/reviews/rpp2024-rev-zprime-searches.pdf)
::::::

### Find the dataset

All data can be found via the [CERN Open Data Portal](https://opendata.cern.ch).
Let's go to the website and search the simulated Z' datasets.

Dataset naming in CMS can seem obscure but let's do something simple and search for "Zprime*":

![](fig/ZprimeODP.png){alt='Search for Zprime* at the CODP'}

The query results are [here](https://opendata.cern.ch/search?q=Zprime%2A&l=list&order=asc&p=1&s=10&sort=bestmatch) and you can see that there are many (over 1000) records returned:

![](fig/ZprimeODP-results.png){alt='Search results for Zprime*'}

Let's narrow down the results and select "Type: Dataset", "Experiment: CMS", "Year: 2016", "File type: nanoaodsim", and "Category: Heavy Gauge Bosons". We've now reduced the number of [matches](https://opendata.cern.ch/search?q=Zprime%2A&f=experiment%3ACMS&f=year%3A2016&f=file_type%3Ananoaodsim&f=category%3AExotica%2Bsubcategory%3AHeavy%20Gauge%20Bosons&f=type%3ADataset&l=list&order=asc&p=1&s=10&sort=bestmatch) from over 1000 down to 210:

![](fig/ZprimeODP-results2.png){alt='Narrowed search results for Zprime*'}

We can discern some of the logic behind the simulated dataset naming. "Zprime" is the particle produced and it decays to various products. We want $Z^{'} \rightarrow t\bar{t}$ which shows up as the third result so let's [narrow the search](https://opendata.cern.ch/search?q=ZprimeToTT%2A&f=experiment%3ACMS&f=year%3A2016&f=file_type%3Ananoaodsim&f=category%3AExotica%2Bsubcategory%3AHeavy%20Gauge%20Bosons&f=type%3ADataset&l=list&order=asc&p=1&s=10&sort=bestmatch) further and search with "ZprimeToTT*":

![](fig/ZprimeToTT-results.png){alt='Narrowed search results for Zprime*'}

We can also discern that the dataset names also include the mass (in GeV) of the hypothetical Z' (e.g. "_M2000").

TO-DO: what do the other strings mean in the dataset name?

Possible challenge: have them select a mass and search for the dataset and select a file for the next part.

Next, let's use the `cernopendata-client` command-line tool to find the datasets
and fetch a file.

### Explore a file

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor

Inline instructor notes can help inform instructors of timing challenges
associated with the lessons. They appear in the "Instructor View"

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: keypoints

- Use `.md` files for episodes when you want static content
- Use `.Rmd` files for episodes when you need to generate output
- Run `sandpaper::check_lesson()` to identify any issues with your lesson
- Run `sandpaper::build_lesson()` to preview your lesson locally

::::::::::::::::::::::::::::::::::::::::::::::::

[r-markdown]: https://rmarkdown.rstudio.com/
107 changes: 107 additions & 0 deletions 04-nanoaod-exercises.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
title: "NanoAOD exercises"
teaching: 10
exercises: 0
---

:::::::::::::::::::::::::::::::::::::: questions

- What have we learned in the pre-exercises and how can we apply it?
- What is the structure and content of the nanoAOD format?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Apply what we have learned in the pre-exercises
- Learn about the structure and content of nanoAOD

::::::::::::::::::::::::::::::::::::::::::::::::

## Exercises with NanoAOD

::::::::::::::::::::::::::::::::::::: challenge

## Exercise 1: Get data file locations

Let's select a ZprimeToTT sample for a given mass and using
the `cernopendata-client` to get the associated data files

Recall what you've learned from the [pre-exercise](https://cms-opendata-workshop.github.io/workshop2024-lesson-dataset-scouting/instructor/04-cli-through-cernopendata-client.html) on the `cernopendata-client`.

:::::::::::::::::::::::: solution

## Solution

Search for the ZprimeToTT samples in the CERN Open Data Portal. The resulting query is [here](https://opendata.cern.ch/search?q=ZprimeToTT%2A&f=experiment%3ACMS&f=year%3A2016&f=file_type%3Ananoaodsim&f=category%3AExotica%2Bsubcategory%3AHeavy%20Gauge%20Bosons&f=type%3ADataset&l=list&order=asc&p=1&s=10&sort=bestmatch).

Next, select a dataset. Here we fetch [this one](https://opendata.cern.ch/record/75124), record 75124, "Simulated dataset ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8 in NANOAODSIM format for 2016 collision data" where the Z' mass is 1000 GeV.

Fetch the Docker image for the `cernopendata-client`:

```bash
docker pull docker.io/cernopendata/cernopendata-client
```

and refresh your memory on the commands:
```bash
docker run -i -t --rm docker.io/cernopendata/cernopendata-client --help
```
```output
Usage: cernopendata-client [OPTIONS] COMMAND [ARGS]...
Command-line client for interacting with CERN Open Data portal.
Options:
--help Show this message and exit.
Commands:
download-files Download data files belonging to a record.
get-file-locations Get a list of data file locations of a record.
get-metadata Get metadata content of a record.
list-directory List contents of a EOSPUBLIC Open Data directory.
verify-files Verify downloaded data file integrity.
version Return cernopendata-client version.
```

Then fetch the files for record 75124:
```python
cernopendata-client get-file-locations --recid 75124
```

```output
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/2520000/65A0736B-22F3-C94C-99AE-36717B28629C.root
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/2520000/6E508763-A12F-8846-A295-F39EE7DDAA52.root
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/2530000/7B2D5CD5-9CAE-C046-A9AB-50CE9D48B187.root
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/260000/1A50245D-8213-6340-8EA0-CB064EEC6AF3.root
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/09FA6C37-21D6-7846-B3E1-F8086CBA0E9E.root
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/3AAB5B1E-7169-9C4D-841C-CB2D6E40CBAE.root
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/820C3EBC-0E1D-CE41-9418-FA1615123FC2.root
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/CF54D079-349C-FB4F-B6E5-3D579D89EDE4.root
http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/80000/E964C281-43FB-D349-A436-9A3FDA0BAA28.root
```
:::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::: challenge

## Exercise 2: Inspect the data file

:::::::::::::: solution


::::::::::::::

::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: keypoints

- Use `.md` files for episodes when you want static content
- Use `.Rmd` files for episodes when you need to generate output
- Run `sandpaper::check_lesson()` to identify any issues with your lesson
- Run `sandpaper::build_lesson()` to preview your lesson locally

::::::::::::::::::::::::::::::::::::::::::::::::

[r-markdown]: https://rmarkdown.rstudio.com/
15 changes: 15 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: "Contributor Code of Conduct"
---

As contributors and maintainers of this project,
we pledge to follow the [The Carpentries Code of Conduct][coc].

All workshop participants are also expected to follow the [CERN Code of Conduct](https://cds.cern.ch/record/2240689/files/BrochureCodeofConductEN.pdf?version=1)

Instances of abusive, harassing, or otherwise unacceptable behavior
may be reported by following our [reporting guidelines][coc-reporting].


[coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html
[coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html
Loading

0 comments on commit f981063

Please sign in to comment.