From 83ba4543a25f85cc801118662268c31530a14268 Mon Sep 17 00:00:00 2001 From: GitHub Actions Date: Tue, 23 Jul 2024 20:53:55 +0000 Subject: [PATCH] markdown source builds Auto-generated via {sandpaper} Source : be339266d0e8dcf28ab3a385b8e1d14850b47223 Branch : main Author : Kati Lassila-Perini Time : 2024-07-23 20:53:17 +0000 Message : Merge pull request #4 from cms-opendata-workshop/klp-02-mini-info 02: add mini info, minor fixes in 01 --- 01-introduction.md | 8 ++-- 02-nanoaod-miniaod.md | 93 +++++++++++++++++++++++++++++++++++++------ md5sum.txt | 4 +- 3 files changed, 86 insertions(+), 19 deletions(-) diff --git a/01-introduction.md b/01-introduction.md index c799b3d..bf4e284 100644 --- a/01-introduction.md +++ b/01-introduction.md @@ -23,7 +23,7 @@ exercises: 5 Most previous releases of CMS open data have been in the Analysis Object Data (AOD) format. This is a complex format and specific CMS software (CMSSW) is required in order to read and analyze it. -From 2015 data releases have been a slimmed-down format called MiniAOD, which has the same essential structure and software requirements for analysis as AOD. Essentially there are few +From 2015 data releases have been a slimmed-down format called MiniAOD, which has the same essential structure and software requirements for analysis as AOD. Essentially there are fewer physics object collections and often the physics objects themselves are different. For data released in 2016 and beyond a new format called NanoAOD is used. NanoAOD is not just simply slimmed-down MiniAOD. In contrast to AOD and MiniAOD which is stored in CMSSW C++ objects, NanoAOD is stored using ROOT TTree objects. You therefore do not need to use the CMS Virtual Machine or docker container to analyze NanoAOD data. NanoAOD can be analyzed using the ROOT program and/or python libraries capable of interpreting the ROOT's TTree structure. @@ -40,13 +40,13 @@ Let us now make sure that you can find that information. ## Exercise 1: Find the NanoAOD variable description for a physics object -Select a physics objects of your choice in the [CMS Physics Objects lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html) and find the corresponding variable listing from a CMS dataset record on the [CERN Open Data portal](https://opendata.cern.ch/). +Select a physics object of your choice in the [CMS Physics Objects lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html) and find the corresponding variable listing from a CMS dataset record on the [CERN Open Data portal](https://opendata.cern.ch/). :::::::::::::: solution Find the NanoAOD variable listing for example for the [SingleElectron collision dataset from 2016 RunG](https://opendata.cern.ch/record/30529). Scroll down to "Dataset semantics" and open the [variable list](https://opendata.cern.ch/eos/opendata/cms/dataset-semantics/NanoAOD/30529/SingleElectron_doc.html). -Find the links to the physics object collections under "Events Content" and find the object of your choice. +Find the links to the physics object collections under "Events Content" and find the object of your choice. Read the object descriptions provided in the [CMS Physics Objects pre-learning lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html). :::::::::::::: @@ -56,7 +56,7 @@ Find the links to the physics object collections under "Events Content" and find ## Exercise 2: Compare variable lists in different collision datasets. -Find all collision datasets from 2016 in NanoAOD format. Compare the variable list. Do Muon datasets contain an electron collection? Do Electron datasets contain a muon collection? Why? +Find all collision datasets from 2016 in NanoAOD format. Compare the variable list. Do Muon datasets contain an electron collection? Do Electron datasets contain a muon collection? :::::::::::::: solution diff --git a/02-nanoaod-miniaod.md b/02-nanoaod-miniaod.md index 3806d45..d411318 100644 --- a/02-nanoaod-miniaod.md +++ b/02-nanoaod-miniaod.md @@ -1,7 +1,7 @@ --- title: "Differences between NanoAOD and MiniAOD" teaching: 10 -exercises: 5 +exercises: 10 --- :::::::::::::::::::::::::::::::::::::: questions @@ -21,7 +21,7 @@ exercises: 5 ## What are the differences between NanoAOD and MiniAOD -In the previous episode, we found the description of the NanoAOD variables. +In the previous episode, we found the description of the NanoAOD variables. If you browse the listing, you will notice that all variables are of fundamental types (floating-point numbers, integers, Boolean values, characters). Let us now compare it to the MiniAOD format. Note that the variable descriptions are not available attached to the datasets, but we can have a look at the [MiniAOD description in the CMS WorkBook](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2016#High_level_physics_objects). @@ -29,28 +29,95 @@ You will see a table starting with: ![](fig/MiniAODTable.png){alt='MiniAOD descripion in the CMS WorkBook'} -The objects in the MiniAOD format are C++ classes in CMSSW, the CMS Software package, and the table gives the class name. We can find the exact class description in the CMSSW reference manual. See, for example +The objects in the MiniAOD format are C++ classes in CMSSW, the CMS Software package, and the table gives the class name corresponding to the physics object. We can find the exact class description in the CMSSW reference manual. See, for example -- [`pat::Muon`](https://cmsdoxygen.web.cern.ch/cmsdoxygen/CMSSW_10_6_25/doc/html/d6/d13/classpat_1_1Muon.html) -- [`pat::Electron`](https://cmsdoxygen.web.cern.ch/cmsdoxygen/CMSSW_10_6_25/doc/html/d2/d1f/classpat_1_1Electron.html). +- Muons: [`pat::Muon`](https://cmsdoxygen.web.cern.ch/cmsdoxygen/CMSSW_10_6_25/doc/html/d6/d13/classpat_1_1Muon.html) +- Electrons: [`pat::Electron`](https://cmsdoxygen.web.cern.ch/cmsdoxygen/CMSSW_10_6_25/doc/html/d2/d1f/classpat_1_1Electron.html). -These are C++ classes that can *inherit* information from parent classes or contain objects, of some complex types. Therefore, some of the variables are not explicitly listed as they are available through other objects. +These are C++ classes that can *inherit* information from parent classes, or contain objects of some complex types. Therefore, some of the variables are not explicitly listed as they are available through other objects. -For example, for MiniAOD, we will not find `eta` or `pt` explicitly in the class description as they can be obtained through the `LorentzVector` object. This is transparent in the code when accessing those values, but much less so in the documentation! +::::::::::::::::::::: challenge -Let us now compare it to NanoAOD. The major difference is that MiniAOD contains most of the constituents of a physics object (such as tracks and/or calorimeter clusters) whereas NanoAOD only contains some information about them. +## Exercise 1: Find NanoAOD variables in MiniAOD -## NanoAOD with particle flow candidates +Compare the physics object information available in NanoAOD and MiniAOD. -Many CMS open data users have relied on the [Particle flow information](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2016#Packed_ParticleFlow_Candidates), available in the MiniAOD format but not in the NanoAOD format. See the class description: [`pat::Packe](https://cmsdoxygen.web.cern.ch/cmsdoxygen/CMSSW_10_6_25/doc/html/d8/d79/classpat_1_1PackedCandidate.html). +Can you find the basic variables such as `charge`, `eta` and `pt` for electrons? -TO-DO -find them and compare variable lists + +:::::::::::::: solution + +For NanoAOD, see for example the [SingleElectron dataset](https://opendata.cern.ch/eos/opendata/cms/dataset-semantics/NanoAOD/30529/SingleElectron_doc.html). + +For MiniAOD, read the general description in the [WorkBook](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2016#High_level_physics_objects) and open the reference page for [`pat::Electron`](https://cmsdoxygen.web.cern.ch/cmsdoxygen/CMSSW_10_6_25/doc/html/d2/d1f/classpat_1_1Electron.html). + +For MiniAOD, we will not find `eta` or `pt` explicitly in the class description as they can be obtained through the `LorentzVector` object. This is transparent in the code when accessing those values, but much less so in the documentation! + +:::::::::::::: + +:::::::::::::::::::: + +Let us now see what information is in MiniAOD but not in NanoAOD. The major difference is that MiniAOD contains most of the constituents of a physics object (such as tracks and/or calorimeter clusters) whereas NanoAOD only contains some information about them. + +::::::::::::::::::::: challenge + +## Exercise 2: Find MiniAOD information that is not in NanoAOD + +Compare the physics object information available in NanoAOD and MiniAOD. + +Find information about the calorimeter cluster and the track connected to an electron. + + +:::::::::::::: solution + +In MiniAOD, access to the track information is provided through a member function `gsfTrack`. + +The full track information is not available in NanoAOD, but the most pertinent information from its associated track is its impact parameter with respect to the primary interaction vertex. This information is available in NanoAOD, read more about it in the [pre-learning material](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/02-electrons.html#electron-4-vector-and-track-information). + +:::::::::::::: + +:::::::::::::::::::: + +## NanoAOD with Particle Flow candidates + +Many CMS open data users have relied on the [Particle Flow information](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2016#Packed_ParticleFlow_Candidates), available in the AOD and MiniAOD formats but not in the NanoAOD format. See the class description: [`pat::PackeedCandidate`](https://cmsdoxygen.web.cern.ch/cmsdoxygen/CMSSW_10_6_25/doc/html/d8/d79/classpat_1_1PackedCandidate.html). + +For the 2016 collision data, a selection of datasets has been processed in NanoAOD format enhanced with Particle Flow information. These datasets can be used in the same way as the usual NanoAOD datasets, they just contain more information. + +::::::::::::::::::::: challenge + +## Exercise 3: Find the datasets in NanoAOD format enhanced with Particle Flow information + +Use the [CERN Open Data portal search facets](https://opendata.cern.ch/search?q=&l=list&order=desc&p=1&s=10&sort=mostrecent) to find these derived datasets. + +Hint: look at the options under "File type". + +Compare the variable list with the standard NanoAOD. + +:::::::::::::: solution + +You can find them by [searching `nanoaod-pf`](https://opendata.cern.ch/search?q=&f=file_type%3Ananoaod-pf&l=list&order=desc&p=1&s=10&sort=mostrecent). + +Select one, find the variable list. Note the section called [`PFCands`](https://opendata.cern.ch/eos/opendata/cms/dataset-semantics/derived-data/PFNano/31312/SingleElectron_doc.html#PFCands) with information about Particle Flow candidates. + +:::::::::::::: + +:::::::::::::::::::: + +An [example workflow](https://opendata.cern.ch/record/12504) is provided (and linked to the dataset record) to show how other datasets can be processed into this enhanced format. In principle, it can be used as such, just changing the MiniAOD input dataset name, executing the code in the CMSSW Docker container. + +::::::::::::: callout + +## Processing MiniAOD to custom NanoAOD takes time and resources + +Processing an entire MiniAOD to custom NanoAOD (i.e. selecting your own objects of interest in addition to those already available in NanoAOD) takes computing resources well beyond a single computer. + +::::::::::::: ## Using MiniAOD -Demo only, +If you need the maximum coverage of CMS provides all that is needed to use data in the MiniAOD format show container diff --git a/md5sum.txt b/md5sum.txt index 99bfc9a..ef9506b 100644 --- a/md5sum.txt +++ b/md5sum.txt @@ -4,8 +4,8 @@ "config.yaml" "ca0c1d383008c0511dbd28fe4963d528" "site/built/config.yaml" "2024-07-23" "index.md" "a02c9c785ed98ddd84fe3d34ddb12fcd" "site/built/index.md" "2024-06-19" "links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-06-19" -"episodes/01-introduction.md" "3428634c476261e7e2452f5fc527dc81" "site/built/01-introduction.md" "2024-07-23" -"episodes/02-nanoaod-miniaod.md" "7c100cc9e1ef303b6a4c3deada4597b4" "site/built/02-nanoaod-miniaod.md" "2024-07-23" +"episodes/01-introduction.md" "071ce7b92b927ab433dce73a8449fd9a" "site/built/01-introduction.md" "2024-07-23" +"episodes/02-nanoaod-miniaod.md" "36cb52df509a78d22defdad51494542e" "site/built/02-nanoaod-miniaod.md" "2024-07-23" "episodes/03-nanoaod-dataset.md" "286712650ebb187b2c69f5aa00492a37" "site/built/03-nanoaod-dataset.md" "2024-07-23" "episodes/04-nanoaod-exercises.md" "2bc567228a634cb85f32cf1838941cb3" "site/built/04-nanoaod-exercises.md" "2024-07-23" "instructors/instructor-notes.md" "cae72b6712578d74a49fea7513099f8c" "site/built/instructor-notes.md" "2024-06-19"