diff --git a/01-introduction.md b/01-introduction.md new file mode 100644 index 0000000..bf4e284 --- /dev/null +++ b/01-introduction.md @@ -0,0 +1,79 @@ +--- +title: "Introduction" +teaching: 10 +exercises: 5 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- What have we learned in the pre-leaning lessons and how can we apply it? +- Where do we find information about physics objects in the CMS NanoAOD format? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Apply what we have learned in the pre-learning lessons about CMS physics objects +- Learn about the documentation of the NanoAOD format + +:::::::::::::::::::::::::::::::::::::::::::::::: + +## Dataformats in CMS + +Most previous releases of CMS open data have been in the Analysis Object Data (AOD) format. +This is a complex format and specific CMS software (CMSSW) is required in order to read and analyze it. + +From 2015 data releases have been a slimmed-down format called MiniAOD, which has the same essential structure and software requirements for analysis as AOD. Essentially there are fewer +physics object collections and often the physics objects themselves are different. + +For data released in 2016 and beyond a new format called NanoAOD is used. NanoAOD is not just simply slimmed-down MiniAOD. In contrast to AOD and MiniAOD which is stored in CMSSW C++ objects, NanoAOD is stored using ROOT TTree objects. You therefore do not need to use the CMS Virtual Machine or docker container to analyze NanoAOD data. NanoAOD can be analyzed using the ROOT program and/or python libraries capable of interpreting the ROOT's TTree structure. + +In this workshop we will focus on working with open data in the NanoAOD format. + +## Physics objects in CMS data + +The recommended [CMS Physics Objects prelearning lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html) guides you through different physics objects and explains what information is available for them in the CMS NanoAOD format. + +Let us now make sure that you can find that information. + +::::::::::::::::::::: challenge + +## Exercise 1: Find the NanoAOD variable description for a physics object + +Select a physics object of your choice in the [CMS Physics Objects lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html) and find the corresponding variable listing from a CMS dataset record on the [CERN Open Data portal](https://opendata.cern.ch/). + +:::::::::::::: solution + +Find the NanoAOD variable listing for example for the [SingleElectron collision dataset from 2016 RunG](https://opendata.cern.ch/record/30529). Scroll down to "Dataset semantics" and open the [variable list](https://opendata.cern.ch/eos/opendata/cms/dataset-semantics/NanoAOD/30529/SingleElectron_doc.html). + +Find the links to the physics object collections under "Events Content" and find the object of your choice. Read the object descriptions provided in the [CMS Physics Objects pre-learning lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/index.html). + +:::::::::::::: + +:::::::::::::::::::: + +::::::::::::::::::::: challenge + +## Exercise 2: Compare variable lists in different collision datasets. + +Find all collision datasets from 2016 in NanoAOD format. Compare the variable list. Do Muon datasets contain an electron collection? Do Electron datasets contain a muon collection? + +:::::::::::::: solution + +Use the search facets of the [search page](https://opendata.cern.ch/search?q=&l=list&order=desc&p=1&s=10&sort=mostrecent). + +Select **Collision** under Dataset, **CMS** under Experiment, **2016** under "Year", and **nanoaod** under File type. + +Open two different collision datasets and check their variable lists. + +:::::::::::::: + +:::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: keypoints + +- The variable list with a variable brief description is linked to all CMS NanoAOD datasets. +- CMS Physics Objects pre-learning lesson describes different physics object variables in more detail. + +:::::::::::::::::::::::::::::::::::::::::::::::: + diff --git a/02-nanoaod-miniaod.md b/02-nanoaod-miniaod.md new file mode 100644 index 0000000..65f740e --- /dev/null +++ b/02-nanoaod-miniaod.md @@ -0,0 +1,261 @@ +--- +title: "Differences between NanoAOD and MiniAOD" +teaching: 10 +exercises: 10 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- What is the structure and content of the NanoAOD format? +- How is it different from MiniAOD? +- What if the required information is not available in the NanoAOD format? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Learn about the structure and content of NanoAOD and how it differs from MiniAOD +- Learn where to find information on how to use MiniAOD + +:::::::::::::::::::::::::::::::::::::::::::::::: + +## What are the differences between NanoAOD and MiniAOD + +In the previous episode, we found the description of the NanoAOD variables. If you browse the listing, you will notice that all variables are of fundamental types (floating-point numbers, integers, Boolean values, characters). + +Let us now compare it to the MiniAOD format. Note that the variable descriptions are not available attached to the datasets, but we can have a look at the [MiniAOD description in the CMS WorkBook](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2016#High_level_physics_objects). + +You will see a table starting with: + +![](fig/MiniAODTable.png){alt='MiniAOD descripion in the CMS WorkBook'} + +The objects in the MiniAOD format are C++ classes in CMSSW, the CMS Software package, and the table gives the class name corresponding to the physics object. We can find the exact class description in the CMSSW reference manual. See, for example + +- Muons: [`pat::Muon`](https://cmsdoxygen.web.cern.ch/cmsdoxygen/CMSSW_10_6_25/doc/html/d6/d13/classpat_1_1Muon.html) +- Electrons: [`pat::Electron`](https://cmsdoxygen.web.cern.ch/cmsdoxygen/CMSSW_10_6_25/doc/html/d2/d1f/classpat_1_1Electron.html). + +These are C++ classes that can *inherit* information from parent classes, or contain objects of some complex types. Therefore, some of the variables are not explicitly listed as they are available through other objects. + +::::::::::::::::::::: challenge + +## Exercise 1: Find NanoAOD variables in MiniAOD + +Compare the physics object information available in NanoAOD and MiniAOD. + +Can you find the basic variables such as `charge`, `eta` and `pt` for electrons? + + +:::::::::::::: solution + +For NanoAOD, see for example the [SingleElectron dataset](https://opendata.cern.ch/eos/opendata/cms/dataset-semantics/NanoAOD/30529/SingleElectron_doc.html). + +For MiniAOD, read the general description in the [WorkBook](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2016#High_level_physics_objects) and open the reference page for [`pat::Electron`](https://cmsdoxygen.web.cern.ch/cmsdoxygen/CMSSW_10_6_25/doc/html/d2/d1f/classpat_1_1Electron.html). + +For MiniAOD, we will not find `eta` or `pt` explicitly in the class description as they can be obtained through the `LorentzVector` object. This is transparent in the code when accessing those values, but much less so in the documentation! + +:::::::::::::: + +:::::::::::::::::::: + +Let us now see what information is in MiniAOD but not in NanoAOD. The major difference is that MiniAOD contains most of the constituents of a physics object (such as tracks and/or calorimeter clusters) whereas NanoAOD only contains some information about them. + +::::::::::::::::::::: challenge + +## Exercise 2: Find MiniAOD information that is not in NanoAOD + +Compare the physics object information available in NanoAOD and MiniAOD. + +Find information about the calorimeter cluster and the track connected to an electron. + + +:::::::::::::: solution + +In MiniAOD, access to the track information is provided through a member function `gsfTrack`. + +The full track information is not available in NanoAOD, but the most pertinent information from its associated track is its impact parameter with respect to the primary interaction vertex. This information is available in NanoAOD, read more about it in the [pre-learning material](https://cms-opendata-workshop.github.io/workshop2024-lesson-physics-objects/instructor/02-electrons.html#electron-4-vector-and-track-information). + +:::::::::::::: + +:::::::::::::::::::: + +## NanoAOD with Particle Flow candidates + +Many CMS open data users have relied on the [Particle Flow information](https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2016#Packed_ParticleFlow_Candidates), available in the AOD and MiniAOD formats but not in the NanoAOD format. See the class description: [`pat::PackeedCandidate`](https://cmsdoxygen.web.cern.ch/cmsdoxygen/CMSSW_10_6_25/doc/html/d8/d79/classpat_1_1PackedCandidate.html). + +For the 2016 collision data, a selection of datasets has been processed in NanoAOD format enhanced with Particle Flow information. These datasets can be used in the same way as the usual NanoAOD datasets, they just contain more information. + +::::::::::::::::::::: challenge + +## Exercise 3: Find the datasets in NanoAOD format enhanced with Particle Flow information + +Use the [CERN Open Data portal search facets](https://opendata.cern.ch/search?q=&l=list&order=desc&p=1&s=10&sort=mostrecent) to find these derived datasets. + +Hint: look at the options under "File type". + +Compare the variable list with the standard NanoAOD. + +:::::::::::::: solution + +You can find them by [searching `nanoaod-pf`](https://opendata.cern.ch/search?q=&f=file_type%3Ananoaod-pf&l=list&order=desc&p=1&s=10&sort=mostrecent). + +Select one, find the variable list. Note the section called [`PFCands`](https://opendata.cern.ch/eos/opendata/cms/dataset-semantics/derived-data/PFNano/31312/SingleElectron_doc.html#PFCands) with information about Particle Flow candidates. + +:::::::::::::: + +:::::::::::::::::::: + +An [example workflow](https://opendata.cern.ch/record/12504) is provided (and linked to the dataset record) to show how other datasets can be processed into this enhanced format. In principle, it can be used as such, just changing the MiniAOD input dataset name, executing the code in the CMSSW Docker container. + +::::::::::::: callout + +## Processing MiniAOD to custom NanoAOD takes time and resources + +Processing an entire MiniAOD to custom NanoAOD (i.e. selecting your own objects of interest in addition to those already available in NanoAOD) takes computing resources well beyond a single computer. + +::::::::::::: + + +## Using MiniAOD + +If you need the maximum coverage of CMS physics objects, know that CMS provides all that is needed to use data in the MiniAOD format. + +You would first find the [CMSSW container image](https://opendata.cern.ch/docs/cms-guide-docker#images) with a version corresponding to the data release, and you would [start a container](https://opendata.cern.ch/docs/cms-guide-docker#mount) in a similar manner as you did for the Python tools and Root containers that we will use in the workshop. + +We will **not** work through it now but if you started the CMSSW container, you would get a container prompt with the CMSSW working area: + +```bash +CMSSW should now be available. +This is a standalone image for CMSSW_10_6_30 slc7_amd64_gcc700. +(/code/CMSSW_10_6_30/src) +``` + +In this environment, you would be able to follow the instructions in [Getting started with miniAOD](https://opendata.cern.ch/docs/cms-getting-started-miniaod), and, for example, inspect the event content with CMSSW tools, e.g. with `edmDumpEventContent`. + +:::::::::::::::: spoiler + +### See the MiniAOD event content + +Find a file name in the file listing of the [SingleElectron MiniAOD record](https://opendata.cern.ch/record/30512) and dump its contents with + +```bash +(/code/CMSSW_10_6_30/src) edmDumpEventContent root://eospublic.cern.ch//eos/opendata/cms/Run2016G/SingleElectron/MINIAOD/UL2016_MiniAODv2-v2/120000/0014ADC0-08B8-1347-B496-CDB3A3A32317.root +Type Module Label Process +---------------------------------------------------------------------------------------------- +edm::TriggerResults "TriggerResults" "" "HLT" +BXVector "gtStage2Digis" "" "RECO" +BXVector "gtStage2Digis" "" "RECO" +BXVector "caloStage2Digis" "EGamma" "RECO" +BXVector "caloStage2Digis" "EtSum" "RECO" +BXVector "caloStage2Digis" "Jet" "RECO" +BXVector "gmtStage2Digis" "Muon" "RECO" +BXVector "caloStage2Digis" "Tau" "RECO" +HcalNoiseSummary "hcalnoise" "" "RECO" +L1GlobalTriggerReadoutRecord "gtDigis" "" "RECO" +double "fixedGridRhoAll" "" "RECO" +double "fixedGridRhoFastjetAll" "" "RECO" +double "fixedGridRhoFastjetAllCalo" "" "RECO" +double "fixedGridRhoFastjetAllTmp" "" "RECO" +double "fixedGridRhoFastjetCentral" "" "RECO" +double "fixedGridRhoFastjetCentralCalo" "" "RECO" +double "fixedGridRhoFastjetCentralChargedPileUp" "" "RECO" +double "fixedGridRhoFastjetCentralNeutral" "" "RECO" +edm::TriggerResults "TriggerResults" "" "RECO" +reco::BeamHaloSummary "BeamHaloSummary" "" "RECO" +reco::BeamSpot "offlineBeamSpot" "" "RECO" +reco::CSCHaloData "CSCHaloData" "" "RECO" +vector "ctppsLocalTrackLiteProducer" "" "RECO" +vector "scalersRawToDigi" "" "RECO" +vector "l1extraParticles" "Isolated" "RECO" +vector "l1extraParticles" "NonIsolated" "RECO" +vector "l1extraParticles" "MET" "RECO" +vector "l1extraParticles" "MHT" "RECO" +vector "l1extraParticles" "" "RECO" +vector "l1extraParticles" "Central" "RECO" +vector "l1extraParticles" "Forward" "RECO" +vector "l1extraParticles" "IsoTau" "RECO" +vector "l1extraParticles" "Tau" "RECO" +vector "l1extraParticles" "" "RECO" +vector "gsfTracksOpenConversions" "gsfTracksOpenConversions" "RECO" +vector "ctppsProtons" "multiRP" "RECO" +vector "ctppsProtons" "singleRP" "RECO" +vector "displacedStandAloneMuons" "" "RECO" +BXVector "simGtExtUnprefireable" "" "PAT" +double "prefiringweight" "nonPrefiringProb" "PAT" +double "prefiringweight" "nonPrefiringProbDown" "PAT" +double "prefiringweight" "nonPrefiringProbUp" "PAT" +edm::Association > "isolatedTracks" "" "PAT" +edm::OwnVector > "slimmedMuonTrackExtras" "" "PAT" +edm::OwnVector > "slimmedJetsPuppi" "tagInfos" "PAT" +edm::RangeMap >,edm::ClonePolicy > "slimmedMuons" "" "PAT" +edm::RangeMap >,edm::ClonePolicy > "slimmedMuons" "" +"PAT" +edm::SortedCollection > "reducedEgamma" "reducedEBRecHits" "PAT" +edm::SortedCollection > "reducedEgamma" "reducedEERecHits" "PAT" +edm::SortedCollection > "reducedEgamma" "reducedESRecHits" "PAT" +edm::SortedCollection > "reducedEgamma" "reducedHBHEHits" "PAT" +edm::SortedCollection > "slimmedHcalRecHits" "reducedHcalRecHits" "PAT" +edm::SortedCollection > "slimmedHcalRecHits" "reducedHcalRecHits" "PAT" +edm::SortedCollection > "slimmedHcalRecHits" "reducedHcalRecHits" "PAT" +edm::TriggerResults "TriggerResults" "" "PAT" +edm::ValueMap "offlineSlimmedPrimaryVertices" "" "PAT" +pat::PackedTriggerPrescales "patTrigger" "" "PAT" +pat::PackedTriggerPrescales "patTrigger" "l1max" "PAT" +pat::PackedTriggerPrescales "patTrigger" "l1min" "PAT" +vector "ctppsLocalTrackLiteProducer" "" "PAT" +vector "oniaPhotonCandidates" "conversions" "PAT" +vector "slimmedElectrons" "" "PAT" +vector "slimmedLowPtElectrons" "" "PAT" +vector "isolatedTracks" "" "PAT" +vector "slimmedJets" "" "PAT" +vector "slimmedJetsAK8" "" "PAT" +vector "slimmedJetsPuppi" "" "PAT" +vector "slimmedJetsAK8PFPuppiSoftDropPacked" "SubJets" "PAT" +vector "slimmedMETs" "" "PAT" +vector "slimmedMETsNoHF" "" "PAT" +vector "slimmedMETsPuppi" "" "PAT" +vector "slimmedMuons" "" "PAT" +vector "lostTracks" "" "PAT" +vector "packedPFCandidates" "" "PAT" +vector "lostTracks" "eleTracks" "PAT" +vector "slimmedOOTPhotons" "" "PAT" +vector "slimmedPhotons" "" "PAT" +vector "slimmedTaus" "" "PAT" +vector "slimmedTausBoosted" "" "PAT" +vector "slimmedPatTrigger" "" "PAT" +vector "reducedEgamma" "reducedEBEEClusters" "PAT" +vector "reducedEgamma" "reducedESClusters" "PAT" +vector "reducedEgamma" "reducedOOTEBEEClusters" "PAT" +vector "reducedEgamma" "reducedOOTESClusters" "PAT" +vector "slimmedCaloJets" "" "PAT" +vector "reducedEgamma" "reducedConversions" "PAT" +vector "reducedEgamma" "reducedSingleLegConversions" "PAT" +vector "isolatedTracks" "" "PAT" +vector "ctppsProtons" "multiRP" "PAT" +vector "ctppsProtons" "singleRP" "PAT" +vector "reducedEgamma" "reducedGedGsfElectronCores" "PAT" +vector "reducedEgamma" "reducedGsfTracks" "PAT" +vector "reducedEgamma" "reducedGedPhotonCores" "PAT" +vector "reducedEgamma" "reducedOOTPhotonCores" "PAT" +vector "reducedEgamma" "reducedOOTSuperClusters" "PAT" +vector "reducedEgamma" "reducedSuperClusters" "PAT" +vector "slimmedMuonTrackExtras" "" "PAT" +vector "offlineSlimmedPrimaryVertices" "" "PAT" +vector "slimmedKshortVertices" "" "PAT" +vector "slimmedLambdaVertices" "" "PAT" +vector "slimmedSecondaryVertices" "" "PAT" +vector "slimmedPatTrigger" "filterLabels" "PAT" +unsigned int "bunchSpacingProducer" "" "PAT" +``` + +:::::::::::::::::::::::: + +You would follow the [instructions](https://opendata.cern.ch/docs/cms-getting-started-miniaod#data) to build a CMSSW analyzer module of your own to select the events and physics object of interest, compile the code and run the analysis in the container. The CMSSW output files are in the ROOT format and you could use the Python tools or the Root container to analyze them further. + +::::::::::::::::::::::::::::::::::::: keypoints + +- Analyses that require detailed information about physics object constituents may require using MiniAOD instead of NanoAOD +- Selected datasets include Particle flow candidates in an enriched NanoAOD format are available and their use does not require using CMS-specific software +- CMSSW environment is available as a Docker container and can be used to work with MiniAOD + + +:::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/03-nanoaod-dataset.md b/03-nanoaod-dataset.md new file mode 100644 index 0000000..adf286d --- /dev/null +++ b/03-nanoaod-dataset.md @@ -0,0 +1,80 @@ +--- +title: "NanoAOD datasets" +teaching: 10 +exercises: 0 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- How do we find a specific nanoAOD dataset? +- How to we explore the content of our nanoAOD dataset? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Know how to find nanoAOD datasets +- Know how to explore the content of nanoAOD + +:::::::::::::::::::::::::::::::::::::::::::::::: + +## Find and explore a nanoAOD dataset + +Let's find and explore a particular which we will get even further into +later: simulated Z' events in which the Z' decays to a top and antitop quark pair. + +:::::: callout +A Z' ("Z-prime") is a hypothetical heavy gauge boson that could come from +extensions of the Standard Model. A review of searches for the Z' +can be found [here](https://pdg.lbl.gov/2024/reviews/rpp2024-rev-zprime-searches.pdf) +:::::: + +### Find the dataset + +All data can be found via the [CERN Open Data Portal](https://opendata.cern.ch). +Let's go to the website and search the simulated Z' datasets. + +Dataset naming in CMS can seem obscure but let's do something simple and search for "Zprime*": + +![](fig/ZprimeODP.png){alt='Search for Zprime* at the CODP'} + +The query results are [here](https://opendata.cern.ch/search?q=Zprime%2A&l=list&order=asc&p=1&s=10&sort=bestmatch) and you can see that there are many (over 1000) records returned: + +![](fig/ZprimeODP-results.png){alt='Search results for Zprime*'} + +Let's narrow down the results and select "Type: Dataset", "Experiment: CMS", "Year: 2016", "File type: nanoaodsim", and "Category: Heavy Gauge Bosons". We've now reduced the number of [matches](https://opendata.cern.ch/search?q=Zprime%2A&f=experiment%3ACMS&f=year%3A2016&f=file_type%3Ananoaodsim&f=category%3AExotica%2Bsubcategory%3AHeavy%20Gauge%20Bosons&f=type%3ADataset&l=list&order=asc&p=1&s=10&sort=bestmatch) from over 1000 down to 210: + +![](fig/ZprimeODP-results2.png){alt='Narrowed search results for Zprime*'} + +We can discern some of the logic behind the simulated dataset naming. "Zprime" is the particle produced and it decays to various products. We want $Z^{'} \rightarrow t\bar{t}$ which shows up as the third result so let's [narrow the search](https://opendata.cern.ch/search?q=ZprimeToTT%2A&f=experiment%3ACMS&f=year%3A2016&f=file_type%3Ananoaodsim&f=category%3AExotica%2Bsubcategory%3AHeavy%20Gauge%20Bosons&f=type%3ADataset&l=list&order=asc&p=1&s=10&sort=bestmatch) further and search with "ZprimeToTT*": + +![](fig/ZprimeToTT-results.png){alt='Narrowed search results for Zprime*'} + +We can also discern that the dataset names also include the mass (in GeV) of the hypothetical Z' (e.g. "_M2000"). + +TO-DO: what do the other strings mean in the dataset name? + +Possible challenge: have them select a mass and search for the dataset and select a file for the next part. + +Next, let's use the `cernopendata-client` command-line tool to find the datasets +and fetch a file. + +### Explore a file + +:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor + +Inline instructor notes can help inform instructors of timing challenges +associated with the lessons. They appear in the "Instructor View" + +:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: keypoints + +- Use `.md` files for episodes when you want static content +- Use `.Rmd` files for episodes when you need to generate output +- Run `sandpaper::check_lesson()` to identify any issues with your lesson +- Run `sandpaper::build_lesson()` to preview your lesson locally + +:::::::::::::::::::::::::::::::::::::::::::::::: + +[r-markdown]: https://rmarkdown.rstudio.com/ diff --git a/04-nanoaod-exercises.md b/04-nanoaod-exercises.md new file mode 100644 index 0000000..af0e138 --- /dev/null +++ b/04-nanoaod-exercises.md @@ -0,0 +1,107 @@ +--- +title: "NanoAOD exercises" +teaching: 10 +exercises: 0 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- What have we learned in the pre-exercises and how can we apply it? +- What is the structure and content of the nanoAOD format? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Apply what we have learned in the pre-exercises +- Learn about the structure and content of nanoAOD + +:::::::::::::::::::::::::::::::::::::::::::::::: + +## Exercises with NanoAOD + +::::::::::::::::::::::::::::::::::::: challenge + +## Exercise 1: Get data file locations + +Let's select a ZprimeToTT sample for a given mass and using +the `cernopendata-client` to get the associated data files + +Recall what you've learned from the [pre-exercise](https://cms-opendata-workshop.github.io/workshop2024-lesson-dataset-scouting/instructor/04-cli-through-cernopendata-client.html) on the `cernopendata-client`. + +:::::::::::::::::::::::: solution + +## Solution + +Search for the ZprimeToTT samples in the CERN Open Data Portal. The resulting query is [here](https://opendata.cern.ch/search?q=ZprimeToTT%2A&f=experiment%3ACMS&f=year%3A2016&f=file_type%3Ananoaodsim&f=category%3AExotica%2Bsubcategory%3AHeavy%20Gauge%20Bosons&f=type%3ADataset&l=list&order=asc&p=1&s=10&sort=bestmatch). + +Next, select a dataset. Here we fetch [this one](https://opendata.cern.ch/record/75124), record 75124, "Simulated dataset ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8 in NANOAODSIM format for 2016 collision data" where the Z' mass is 1000 GeV. + +Fetch the Docker image for the `cernopendata-client`: + +```bash +docker pull docker.io/cernopendata/cernopendata-client +``` + +and refresh your memory on the commands: +```bash +docker run -i -t --rm docker.io/cernopendata/cernopendata-client --help +``` +```output +Usage: cernopendata-client [OPTIONS] COMMAND [ARGS]... + + Command-line client for interacting with CERN Open Data portal. + +Options: + --help Show this message and exit. + +Commands: + download-files Download data files belonging to a record. + get-file-locations Get a list of data file locations of a record. + get-metadata Get metadata content of a record. + list-directory List contents of a EOSPUBLIC Open Data directory. + verify-files Verify downloaded data file integrity. + version Return cernopendata-client version. +``` + +Then fetch the files for record 75124: +```python +cernopendata-client get-file-locations --recid 75124 +``` + +```output +http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/2520000/65A0736B-22F3-C94C-99AE-36717B28629C.root +http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/2520000/6E508763-A12F-8846-A295-F39EE7DDAA52.root +http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/2530000/7B2D5CD5-9CAE-C046-A9AB-50CE9D48B187.root +http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/260000/1A50245D-8213-6340-8EA0-CB064EEC6AF3.root +http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/09FA6C37-21D6-7846-B3E1-F8086CBA0E9E.root +http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/3AAB5B1E-7169-9C4D-841C-CB2D6E40CBAE.root +http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/820C3EBC-0E1D-CE41-9418-FA1615123FC2.root +http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/270000/CF54D079-349C-FB4F-B6E5-3D579D89EDE4.root +http://opendata.cern.ch/eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v2/80000/E964C281-43FB-D349-A436-9A3FDA0BAA28.root +``` +::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::: challenge + +## Exercise 2: Inspect the data file + +:::::::::::::: solution + + +:::::::::::::: + +:::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: keypoints + +- Use `.md` files for episodes when you want static content +- Use `.Rmd` files for episodes when you need to generate output +- Run `sandpaper::check_lesson()` to identify any issues with your lesson +- Run `sandpaper::build_lesson()` to preview your lesson locally + +:::::::::::::::::::::::::::::::::::::::::::::::: + +[r-markdown]: https://rmarkdown.rstudio.com/ diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 0000000..19ebbc1 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,15 @@ +--- +title: "Contributor Code of Conduct" +--- + +As contributors and maintainers of this project, +we pledge to follow the [The Carpentries Code of Conduct][coc]. + +All workshop participants are also expected to follow the [CERN Code of Conduct](https://cds.cern.ch/record/2240689/files/BrochureCodeofConductEN.pdf?version=1) + +Instances of abusive, harassing, or otherwise unacceptable behavior +may be reported by following our [reporting guidelines][coc-reporting]. + + +[coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html +[coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 0000000..8c5f6f6 --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,123 @@ +--- +title: "Licenses" +--- + +## Instructional Material + +All CMS instructional material is +made available under the [Creative Commons Attribution +license][cc-by-human]. The following is a human-readable summary of +(and not a substitute for) the [full legal text of the CC BY 4.0 +license][cc-by-legal]. + +You are free: + +* to **Share**---copy and redistribute the material in any medium or format +* to **Adapt**---remix, transform, and build upon the material + +for any purpose, even commercially. + +The licensor cannot revoke these freedoms as long as you follow the +license terms. + +Under the following terms: + +* **Attribution**---You must give appropriate credit (mentioning that + your work is derived from work that is Copyright © The CMS Collaboration + and, where practical, linking to + https://cms.cern/), provide a [link to the + license][cc-by-human], and indicate if changes were made. You may do + so in any reasonable manner, but not in any way that suggests the + licensor endorses you or your use. + +**No additional restrictions**---You may not apply legal terms or +technological measures that legally restrict others from doing +anything the license permits. With the understanding that: + +Notices: + +* You do not have to comply with the license for elements of the + material in the public domain or where your use is permitted by an + applicable exception or limitation. +* No warranties are given. The license may not give you all of the + permissions necessary for your intended use. For example, other + rights such as publicity, privacy, or moral rights may limit how you + use the material. + +## Templates and Framework from software carpentries + +This lesson is built on a framework developed by Software Carpentry. Below is their license: + +All Carpentries (Software Carpentry, Data Carpentry, and Library Carpentry) +instructional material is made available under the [Creative Commons +Attribution license][cc-by-human]. The following is a human-readable summary of +(and not a substitute for) the [full legal text of the CC BY 4.0 +license][cc-by-legal]. + +You are free: + +- to **Share**---copy and redistribute the material in any medium or format +- to **Adapt**---remix, transform, and build upon the material + +for any purpose, even commercially. + +The licensor cannot revoke these freedoms as long as you follow the license +terms. + +Under the following terms: + +- **Attribution**---You must give appropriate credit (mentioning that your work + is derived from work that is Copyright (c) The Carpentries and, where + practical, linking to ), provide a [link to the + license][cc-by-human], and indicate if changes were made. You may do so in + any reasonable manner, but not in any way that suggests the licensor endorses + you or your use. + +- **No additional restrictions**---You may not apply legal terms or + technological measures that legally restrict others from doing anything the + license permits. With the understanding that: + +Notices: + +* You do not have to comply with the license for elements of the material in + the public domain or where your use is permitted by an applicable exception + or limitation. +* No warranties are given. The license may not give you all of the permissions + necessary for your intended use. For example, other rights such as publicity, + privacy, or moral rights may limit how you use the material. + +## Software + +Except where otherwise noted, the example programs and other software provided +by The Carpentries are made available under the [OSI][osi]-approved [MIT +license][mit-license]. + +Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the "Software"), to deal in +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do +so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + +## Trademark + +"The Carpentries", "Software Carpentry", "Data Carpentry", and "Library +Carpentry" and their respective logos are registered trademarks of [Community +Initiatives][ci]. + +[cc-by-human]: https://creativecommons.org/licenses/by/4.0/ +[cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode +[mit-license]: https://opensource.org/licenses/mit-license.html +[ci]: https://communityin.org/ +[osi]: https://opensource.org diff --git a/config.yaml b/config.yaml new file mode 100644 index 0000000..5ba3617 --- /dev/null +++ b/config.yaml @@ -0,0 +1,85 @@ +#------------------------------------------------------------ +# Values for this lesson. +# Dummy push to see if our varnish updates are taken... +#------------------------------------------------------------ + +# Which carpentry is this (swc, dc, lc, or cp)? +# swc: Software Carpentry +# dc: Data Carpentry +# lc: Library Carpentry +# cp: Carpentries (to use for instructor training for instance) +# incubator: The Carpentries Incubator +carpentry: 'incubator' + +# Add value for CMS customized "varnish" +varnish: 'cms-opendata-workshop/varnish' + +# Overall title for pages. +title: 'Exploring CMS nanoAOD' # FIXME + +# Date the lesson was created (YYYY-MM-DD, this is empty by default) +created: 2024-07-13 # FIXME + +# Comma-separated list of keywords for the lesson +keywords: 'software, data, lesson, CMS, physics analysis, analysis' # FIXME + +# Life cycle stage of the lesson +# possible values: pre-alpha, alpha, beta, stable +life_cycle: 'pre-alpha' # FIXME + +# License of the lesson +license: 'CC-BY 4.0' + +# Link to the source repository for this lesson +source: 'https://github.com/cms-opendata-workshop/workshop2024-lesson-exploring-cms-nanoaod' # FIXME + +# Default branch of your lesson +branch: 'main' + +# Who to contact if there are any issues +contact: 'cms-dpoa-coordinator@cern.ch' # FIXME + +# Navigation ------------------------------------------------ +# +# Use the following menu items to specify the order of +# individual pages in each dropdown section. Leave blank to +# include all pages in the folder. +# +# Example ------------- +# +# episodes: +# - introduction.md +# - first-steps.md +# +# learners: +# - setup.md +# +# instructors: +# - instructor-notes.md +# +# profiles: +# - one-learner.md +# - another-learner.md + +# Order of episodes in your lesson +episodes: +- 01-introduction.md +- 02-nanoaod-miniaod.md +- 03-nanoaod-dataset.md +- 04-nanoaod-exercises.md + +# Information for Learners +learners: + +# Information for Instructors +instructors: + +# Learner Profiles +profiles: + +# Customisation --------------------------------------------- +# +# This space below is where custom yaml items (e.g. pinning +# sandpaper and varnish versions) should live + + diff --git a/dpoa_nanoaod_sandbox.ipynb b/dpoa_nanoaod_sandbox.ipynb new file mode 100644 index 0000000..b62502c --- /dev/null +++ b/dpoa_nanoaod_sandbox.ipynb @@ -0,0 +1,1285 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "6d44161e-000c-492e-b16f-e665d3c24b71", + "metadata": {}, + "source": [ + "# Samples\n", + "\n", + "## Signal\n", + "\n", + "M = 3000 GeV\n", + "\n", + "https://opendata.cern.ch/record/75156\n", + "\n", + "## Backgrounds\n", + "\n", + "### W+jets\n", + "https://opendata.cern.ch/record/69747\n", + "\n", + "or\n", + "\n", + "https://opendata.cern.ch/record/69745\n", + "\n", + "### TT semilep\n", + "https://opendata.cern.ch/record/67993\n", + "\n", + "### TT hadronic\n", + "https://opendata.cern.ch/record/67841\n", + "\n", + "### TT leptonic\n", + "https://opendata.cern.ch/record/19958\n", + "\n", + "\n", + "\n", + "# Trigger\n", + "\n", + "The trigger used for the muon channel is the “OR” combination of the HLT paths:HLT Mu50 v*\n", + "196 , HLT TkMu50 v*. Similarly, the scale factors for this trigger combination were provided by the\n", + "197 Muon POG [33].\n", + "\n", + "# Kinematics \n", + "\n", + "Referencing this note\n", + "\n", + "[1] M. Adams et al., “Search for ttbar resonances in boosted semileptonic final states at\n", + "√14 s = 13 TeV”, CMS Analysis Note AN-2015/107 (2015).\n", + "\n", + "for discussions of invariant mass and using the mass of the W to constrain the kinematics of the missing energy in the transverse plane. See Eqn. 3." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6744b800-773d-4d90-95f5-a6b360f39a97", + "metadata": {}, + "outputs": [], + "source": [ + "# Run this if these are not installed and upgraded\n", + "'''\n", + "!pip install --upgrade awkward\n", + "!pip install --upgrade uproot\n", + "\n", + "!pip install --upgrade matplotlib\n", + "\n", + "!pip install vector\n", + "'''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "174179f3-cef1-4ef4-a98b-ec0dd14809cb", + "metadata": {}, + "outputs": [], + "source": [ + "# The classics\n", + "import numpy as np\n", + "import matplotlib.pylab as plt\n", + "import matplotlib # To get the version\n", + "\n", + "import pandas as pd\n", + "\n", + "# The newcomers\n", + "import awkward as ak\n", + "import uproot\n", + "\n", + "import vector\n", + "vector.register_awkward()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a5153d2a-de91-4bb3-8637-621e6460126c", + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Versions --------\\n\")\n", + "print(f\"{ak.__version__ = }\\n\")\n", + "print(f\"{uproot.__version__ = }\\n\")\n", + "print(f\"{np.__version__ = }\\n\")\n", + "print(f\"{matplotlib.__version__ = }\\n\")\n", + "print(f\"{vector.__version__ = }\\n\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf992912-7633-4c2a-bda1-153b16cd1c94", + "metadata": {}, + "outputs": [], + "source": [ + "####### Backgrounds\n", + "# W+jets\n", + "#dataset = \"Wjets\"\n", + "#filename = 'root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/270000/00702195-E707-3743-8BBA-57EB9DEE1DBA.root'\n", + "\n", + "# ttbar leptonic\n", + "#dataset = \"tt_lep\"\n", + "#filename = 'root://eospublic.cern.ch//eos/opendata/cms/mc/RunIIFall15MiniAODv2/TTTo2L2Nu_13TeV-powheg/MINIAODSIM/PU25nsData2015v1_76X_mcRun2_asymptotic_v12-v1/00000/02A468DA-E8B9-E511-942C-0022195E688C.root'\n", + "\n", + "# ttbar hadronic\n", + "#dataset = \"tt_had\"\n", + "#filename = 'root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/130000/009086DB-1E42-7545-9A35-1433EC89D04B.root'\n", + "\n", + "# ttbar semileptonic\n", + "#dataset = \"tt_semilep\"\n", + "#filename = 'root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/120000/08FCB2ED-176B-064B-85AB-37B898773B98.root'\n", + "\n", + "\n", + "########### Signal\n", + "dataset = 'signal'\n", + "filename = 'root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZprimeToTT_M2000_W20_TuneCP2_PSweights_13TeV-madgraph-pythiaMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/270000/22BAB5D2-9E3F-E440-AB30-AE6DBFDF6C83.root'\n", + "\n", + "\n", + "# Open the file \n", + "f = uproot.open(filename)\n", + "\n", + "events = f['Events']\n", + "\n", + "nevents = events.num_entries\n", + "\n", + "print(f\"{nevents = }\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15dc2754-7112-46dd-9e54-ce77f849f870", + "metadata": {}, + "outputs": [], + "source": [ + "def pretty_print(fields, fmt='40s', require=None, ignore=None):\n", + " \n", + " output = \"\"\n", + " \n", + " for f in fields:\n", + " PASSED = True\n", + " if require is not None:\n", + " if type(require) != list:\n", + " require = [require]\n", + " PASSED = True\n", + " for r in require:\n", + " if f.find(r) < 0:\n", + " PASSED = False\n", + " \n", + " # Did not find a string and so skip\n", + " if PASSED is False:\n", + " continue\n", + " \n", + " if ignore is not None:\n", + " if f.find(ignore) >= 0:\n", + " continue\n", + " \n", + " if len(output) + len(f) <= 80:\n", + " output += f\"{f:{fmt}} \"\n", + " else:\n", + " print(output)\n", + " output = f\"{f:{fmt}} \"\n", + " \n", + " print(output)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "99f71009-167e-4caf-bfbe-3d41f0b754e8", + "metadata": {}, + "outputs": [], + "source": [ + "# Pretty print all the fields\n", + "#pretty_print(events.keys())\n", + "\n", + "# Pretty print some subsets\n", + "#pretty_print(events.keys(), fmt='30s', require='FatJet')\n", + "#pretty_print(events.keys(), fmt='40s', require=['Muon', 'Iso'], ignore='HLT')\n", + "#pretty_print(events.keys(), fmt='40s', require=['HLT', 'TkMu50'])\n", + "#pretty_print(events.keys(), fmt='40s', require='HLT')\n", + "#pretty_print(events.keys(), fmt='40s', require='Jet_', ignore='Fat')\n", + "pretty_print(events.keys(), fmt='40s', require='PuppiMET', ignore='Raw')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2a23de6b-5913-4b64-ae88-da06467e3070", + "metadata": {}, + "outputs": [], + "source": [ + "fatjet_mSD = events['FatJet_msoftdrop'].array()\n", + "\n", + "fatjet_tag = events['FatJet_particleNet_TvsQCD'].array()\n", + "\n", + "fatjet_tau2 = events['FatJet_tau2'].array()\n", + "fatjet_tau3 = events['FatJet_tau3'].array()\n", + "\n", + "fatjet_pt = events['FatJet_pt'].array()\n", + "fatjet_eta = events['FatJet_eta'].array()\n", + "fatjet_phi = events['FatJet_phi'].array()\n", + "fatjet_mass = events['FatJet_mass'].array()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e0d8bbfd-4255-4fdb-9111-c1f3921cc5c0", + "metadata": {}, + "outputs": [], + "source": [ + "muon_pt = events['Muon_pt'].array()\n", + "muon_eta = events['Muon_eta'].array()\n", + "muon_phi = events['Muon_phi'].array()\n", + "muon_mass = events['Muon_mass'].array()\n", + "\n", + "muon_iso = events['Muon_miniIsoId'].array()\n", + "\n", + "muon_tightId = events['Muon_tightId'].array()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a92d1ea3-038f-45e6-8cd3-809b07f88378", + "metadata": {}, + "outputs": [], + "source": [ + "jet_btag = events['Jet_btagDeepB'].array()\n", + "\n", + "jet_jetid = events['Jet_jetId'].array()\n", + "\n", + "jet_pt = events['Jet_pt'].array()\n", + "jet_eta = events['Jet_eta'].array()\n", + "jet_phi = events['Jet_phi'].array()\n", + "jet_mass = events['Jet_mass'].array()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e966ea4f-93dd-441a-9501-ea04fb3c4323", + "metadata": {}, + "outputs": [], + "source": [ + "met_pt = events['PuppiMET_pt'].array()\n", + "met_eta = 0*events['PuppiMET_pt'].array() # Fix this to be 0\n", + "met_phi = events['PuppiMET_phi'].array() \n", + "met_energy = events['PuppiMET_sumEt'].array() # Is this the right thing to use?\n", + "\n", + "ht_lep = muon_pt + met_pt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "abfde43a-6b23-4950-8bdd-2ec5dbe660bd", + "metadata": {}, + "outputs": [], + "source": [ + "# Cuts\n", + "tau32 = fatjet_tau3/fatjet_tau2\n", + "\n", + "#cut_fatjet = (tau32>0.67) & (fatjet_eta>-2.4) & (fatjet_eta<2.4) & (fatjet_mSD>105) & (fatjet_mSD<220)\n", + "cut_fatjet = (fatjet_pt > 500) & (fatjet_tag > 0.5)\n", + "\n", + "cut_muon = (muon_pt>55) & (muon_eta>-2.4) & (muon_eta<2.4) & \\\n", + " (muon_tightId == True) & (muon_iso>1) & (ht_lep>150)\n", + "\n", + "cut_jet = (jet_btag > 0.5) & (jet_jetid>=4)\n", + "\n", + "\n", + "\n", + "# Event cut\n", + "cut_met = (met_pt > 50)\n", + "\n", + "cut_nmuons = ak.num(cut_muon[cut_muon]) == 1\n", + "\n", + "cut_trigger = (events['HLT_TkMu50'].array())\n", + "\n", + "\n", + "cut_ntop = ak.num(cut_fatjet[cut_fatjet]) == 1\n", + "\n", + "cut_full_event = cut_trigger & cut_nmuons & cut_met & cut_ntop" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c584a5cc-c568-438d-8abd-ba5636feefdb", + "metadata": {}, + "outputs": [], + "source": [ + "fatjets = ak.zip(\n", + " {\"pt\": fatjet_pt[cut_full_event][cut_fatjet[cut_full_event]], \n", + " \"eta\": fatjet_eta[cut_full_event][cut_fatjet[cut_full_event]], \n", + " \"phi\": fatjet_phi[cut_full_event][cut_fatjet[cut_full_event]], \n", + " \"mass\": fatjet_mass[cut_full_event][cut_fatjet[cut_full_event]]},\n", + " with_name=\"Momentum4D\",\n", + ")\n", + "\n", + "muons = ak.zip(\n", + " {\"pt\": muon_pt[cut_full_event][cut_muon[cut_full_event]], \n", + " \"eta\": muon_eta[cut_full_event][cut_muon[cut_full_event]], \n", + " \"phi\": muon_phi[cut_full_event][cut_muon[cut_full_event]], \n", + " \"mass\": muon_mass[cut_full_event][cut_muon[cut_full_event]]},\n", + " with_name=\"Momentum4D\",\n", + ")\n", + "\n", + "jets = ak.zip(\n", + " {\"pt\": jet_pt[cut_full_event][cut_jet[cut_full_event]], \n", + " \"eta\": jet_eta[cut_full_event][cut_jet[cut_full_event]], \n", + " \"phi\": jet_phi[cut_full_event][cut_jet[cut_full_event]], \n", + " \"mass\": jet_mass[cut_full_event][cut_jet[cut_full_event]]},\n", + " with_name=\"Momentum4D\",\n", + ")\n", + "\n", + "met = ak.zip(\n", + " {\"pt\": met_pt[cut_full_event], \n", + " \"eta\": met_eta[cut_full_event], \n", + " \"phi\": met_phi[cut_full_event], \n", + " \"e\": met_energy[cut_full_event]},\n", + " with_name=\"Momentum4D\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5919dd32-78b1-49e8-b67f-b99ce7b7b45b", + "metadata": {}, + "outputs": [], + "source": [ + "p4mu,p4fj,p4j,p4met = ak.unzip(ak.cartesian([muons, fatjets, jets, met]))\n" + ] + }, + { + "cell_type": "markdown", + "id": "9cf21388-9669-48b6-b736-47fd50a32f96", + "metadata": {}, + "source": [ + "## Trying to get W mass constrainting working \n", + "\n", + "Maybe we ignore this in the end. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05acb49c-13cc-4986-8393-89131112e29a", + "metadata": {}, + "outputs": [], + "source": [ + "#newmet = vector.Array([{\"x\":met.x, \"y\":met.y, \"z\":met.z, \"e\":met.e}])\n", + "\n", + "newmet = ak.zip(\n", + " {\"x\": p4met.x, \n", + " \"y\": p4met.y,\n", + " \"z\": tempz,\n", + " \"e\": p4met.e\n", + " }, with_name=\"Momentum4D\",\n", + ")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d6225b2e-336c-4c57-882b-2977a4ec4efd", + "metadata": {}, + "outputs": [], + "source": [ + "print(met)\n", + "print()\n", + "print(newmet)\n", + "\n", + "print()\n", + "\n", + "print(met.x)\n", + "print()\n", + "print(newmet.x)\n", + "\n", + "print()\n", + "print(met.z)\n", + "print()\n", + "print(newmet.z)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3024c5a5-102a-4819-9cf7-489596b934af", + "metadata": {}, + "outputs": [], + "source": [ + "#p4mu,p4fj,p4j,p4met = ak.unzip(ak.cartesian([muons, fatjets, jets, newmet]))\n", + "p4mu,p4fj,p4j,p4met = ak.unzip(ak.cartesian([muons, fatjets, jets, met]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ae1c6c08-26ba-4981-a058-d83cb91e3a02", + "metadata": {}, + "outputs": [], + "source": [ + "p4tot = p4mu + p4fj + p4j + p4met" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f824de0c-a648-4311-b451-04109f3496c6", + "metadata": {}, + "outputs": [], + "source": [ + "plt.hist(ak.flatten(p4tot.mass),bins=50, range=(0,7000));" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ec26d4c2-5059-48dc-9fe6-9e43f6b8b55a", + "metadata": {}, + "outputs": [], + "source": [ + "mydict = {}\n", + "mydict['mtt'] = ak.flatten(p4tot.mass) \n", + "mydict['mu_pt'] = ak.flatten(p4mu.pt) \n", + "\n", + "df = pd.DataFrame.from_dict(mydict)\n", + "\n", + "df\n", + "\n", + "outfilename = f\"output_{dataset}_{filename.split('/')[-1].split('.')[0]}.csv\"\n", + "print(outfilename)\n", + "\n", + "df.to_csv(outfilename, index=False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0ac6b187-790d-409e-bba5-e4e970282bac", + "metadata": {}, + "outputs": [], + "source": [ + "!cat output_signal_22BAB5D2-9E3F-E440-AB30-AE6DBFDF6C83.csv" + ] + }, + { + "cell_type": "markdown", + "id": "a8b76c2a-4c5c-4c3a-8fec-68c89a1fc179", + "metadata": {}, + "source": [ + "# Sandbox" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "57cc38f0-c3ab-4d84-8eaf-cd9805801159", + "metadata": {}, + "outputs": [], + "source": [ + "# Run this if these are not installed\n", + "'''\n", + "!pip install --upgrade awkward\n", + "!pip install --upgrade uproot\n", + "\n", + "!pip install coffea\n", + "\n", + "!pip install --upgrade matplotlib\n", + "\n", + "!pip install vector\n", + "'''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "161fb34f-736f-40d3-af80-cd195bed55d5", + "metadata": {}, + "outputs": [], + "source": [ + "import awkward as ak\n", + "import uproot\n", + "\n", + "import coffea\n", + "\n", + "from coffea.nanoevents import NanoEventsFactory, NanoAODSchema\n", + "\n", + "import numpy as np\n", + "\n", + "import matplotlib.pylab as plt\n", + "\n", + "import matplotlib\n", + "\n", + "import vector\n", + "vector.register_awkward()\n", + "\n", + "import pandas as pd\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13608cc3-9c7d-4401-a21b-e9583a21bdcd", + "metadata": {}, + "outputs": [], + "source": [ + "print(f\"{ak.__version__ = }\")\n", + "print(f\"{uproot.__version__ = }\")\n", + "print(f\"{coffea.__version__ = }\")\n", + "print(f\"{np.__version__ = }\")\n", + "print(f\"{matplotlib.__version__ = }\")\n", + "print(f\"{vector.__version__ = }\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5750ad90-a5cd-4fa5-806e-5e41719978c2", + "metadata": {}, + "outputs": [], + "source": [ + "# Signal\n", + "filename = 'root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/ZprimeToTT_M2000_W20_TuneCP2_PSweights_13TeV-madgraph-pythiaMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/270000/22BAB5D2-9E3F-E440-AB30-AE6DBFDF6C83.root'\n", + "\n", + "\n", + "# TT to semilep\n", + "#filename = 'root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/TTToSemiLeptonic_TuneCP5_13TeV-powheg-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/120000/08FCB2ED-176B-064B-85AB-37B898773B98.root'\n", + "\n", + "f = uproot.open(filename)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "62963977-b776-4d79-9eb3-894f7e0a5c03", + "metadata": {}, + "outputs": [], + "source": [ + "#fname = \"https://raw.githubusercontent.com/CoffeaTeam/coffea/master/tests/samples/nano_dy.root\"\n", + "#events = NanoEventsFactory.from_root(\n", + "# {fname: \"Events\"},\n", + "# schemaclass=NanoAODSchema,\n", + "#).events()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "921d199a-d76e-404a-8798-58ae536e2448", + "metadata": {}, + "outputs": [], + "source": [ + "events = f['Events']\n", + "\n", + "#rrays = events.arrays(filter_name=['Muon_*', 'Jet_*'])\n", + "\n", + "\n", + "#arrays\n", + "#events.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b0ab12c5-1234-458b-88b8-ebc90a3cfd4b", + "metadata": {}, + "outputs": [], + "source": [ + "def pretty_print(fields, fmt='40s', require=None, ignore=None):\n", + " \n", + " output = \"\"\n", + " \n", + " for f in fields:\n", + " PASSED = True\n", + " if require is not None:\n", + " if type(require) != list:\n", + " require = [require]\n", + " PASSED = True\n", + " for r in require:\n", + " if f.find(r) < 0:\n", + " PASSED = False\n", + " \n", + " # Did not find a string and so skip\n", + " if PASSED is False:\n", + " continue\n", + " \n", + " if ignore is not None:\n", + " if f.find(ignore) >= 0:\n", + " continue\n", + " \n", + " if len(output) + len(f) <= 80:\n", + " output += f\"{f:{fmt}} \"\n", + " else:\n", + " print(output)\n", + " output = f\"{f:{fmt}} \"\n", + " \n", + " print(output)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa5815d5-f746-4f79-9235-7c451e0fe8d1", + "metadata": {}, + "outputs": [], + "source": [ + "#pretty_print(events.keys())\n", + "\n", + "pretty_print(events.keys(), fmt='30s', require='FatJet')\n", + "#pretty_print(events.keys(), fmt='40s', require=['Muon', 'Iso'], ignore='HLT')\n", + "#pretty_print(events.keys(), fmt='40s', require=['HLT', 'TkMu50'])\n", + "#pretty_print(events.keys(), fmt='40s', require='HLT')\n", + "#pretty_print(events.keys(), fmt='40s', require='Jet_', ignore='Fat')\n", + "#pretty_print(events.keys(), fmt='40s', require='PuppiMET')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f696330f-d589-417c-91e1-1493a0a821de", + "metadata": {}, + "outputs": [], + "source": [ + "cut_trigger = events['HLT_TkMu50'].array()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4fba43ba-6429-469a-999f-8dd1a8c3e58e", + "metadata": {}, + "outputs": [], + "source": [ + "cut_full_event = cut_trigger" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "339e9c4a-1693-44af-abf6-ac99c781f92b", + "metadata": {}, + "outputs": [], + "source": [ + "len(cut_full_event)\n", + "cut_full_event\n", + "ak.num(events['FatJet_eta'].array(), axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0569d1b6-63a5-4612-998f-070dc97ffa4b", + "metadata": {}, + "outputs": [], + "source": [ + "ak.num(events['FatJet_eta'].array()[cut_full_event], axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "367975c0-b6b9-4c1f-aad0-0e1313f79d6e", + "metadata": {}, + "outputs": [], + "source": [ + "#branches = events.arrays()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "96e8852f-f1d2-47ec-bc90-b86c2b1aadfa", + "metadata": {}, + "outputs": [], + "source": [ + "fatjet_pt = events['FatJet_pt'].array()\n", + "cut_temp = fatjet_pt>400\n", + "\n", + "print(len(fatjet_pt))\n", + "print(fatjet_pt[0:10])\n", + "\n", + "fatjet_pt = events['FatJet_pt'].array()[cut_full_event][cut_temp[cut_full_event]]\n", + "#fatjet_pt = events['FatJet_pt'].array()[cut_temp[cut_full_event]]\n", + "\n", + "print(len(fatjet_pt))\n", + "print(fatjet_pt[0:10])\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b6574e8b-bb71-4509-8ce6-d7a422c3eaa2", + "metadata": {}, + "outputs": [], + "source": [ + "fatjet_mSD = events['FatJet_msoftdrop'].array()#[cut_full_event]\n", + "fatjet_eta = events['FatJet_eta'].array()#[cut_full_event]\n", + "\n", + "fatjet_tag = events['FatJet_particleNet_TvsQCD'].array()\n", + "\n", + "fatjet_tau2 = events['FatJet_tau2'].array()#[cut_full_event]\n", + "fatjet_tau3 = events['FatJet_tau3'].array()#[cut_full_event]\n", + "\n", + "fatjet_pt = events['FatJet_pt'].array()#[cut_full_event]\n", + "fatjet_eta = events['FatJet_eta'].array()#[cut_full_event]\n", + "fatjet_phi = events['FatJet_phi'].array()#[cut_full_event]\n", + "fatjet_mass = events['FatJet_mass'].array()#[cut_full_event]\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f0d17bc9-bc72-416b-9530-d45d770ca366", + "metadata": {}, + "outputs": [], + "source": [ + "muon_pt = events['Muon_pt'].array()#[cut_full_event]\n", + "muon_eta = events['Muon_eta'].array()#[cut_full_event]\n", + "muon_phi = events['Muon_phi'].array()#[cut_full_event]\n", + "muon_mass = events['Muon_mass'].array()#[cut_full_event]\n", + "\n", + "muon_iso = events['Muon_miniIsoId'].array()\n", + "\n", + "muon_tightId = events['Muon_tightId'].array()#[cut_full_event]\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3909db13-a14b-4419-a0c9-59b6c1465f5f", + "metadata": {}, + "outputs": [], + "source": [ + "jet_btag = events['Jet_btagDeepB'].array()#[cut_full_event]\n", + "\n", + "jet_jetid = events['Jet_jetId'].array()\n", + "\n", + "jet_pt = events['Jet_pt'].array()#[cut_full_event]\n", + "jet_eta = events['Jet_eta'].array()#[cut_full_event]\n", + "jet_phi = events['Jet_phi'].array()#[cut_full_event]\n", + "jet_mass = events['Jet_mass'].array()#[cut_full_event]\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "45474c93-e0ba-42bb-8fd0-feb2baf8f5ee", + "metadata": {}, + "outputs": [], + "source": [ + "jet_mass" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "76285cac-b1ba-41d3-b4da-7e76f7dfa1e7", + "metadata": {}, + "outputs": [], + "source": [ + "met_pt = events['PuppiMET_pt'].array()#[cut_full_event]\n", + "met_eta = 0*events['PuppiMET_pt'].array()#[cut_full_event] # Fix this to be 0\n", + "met_phi = events['PuppiMET_phi'].array()#[cut_full_event] \n", + "met_energy = events['PuppiMET_sumEt'].array()#[cut_full_event] \n", + "\n", + "ht_lep = muon_pt + met_pt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "59b25efe-4afd-400a-a3d2-aac6b50c4fda", + "metadata": {}, + "outputs": [], + "source": [ + "#met_energy" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a381ec34-0232-4b2e-8d9e-6815f86ad10e", + "metadata": {}, + "outputs": [], + "source": [ + "#jet_jetid" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8a062946-add8-4966-82bd-46661672587b", + "metadata": {}, + "outputs": [], + "source": [ + "# Cuts\n", + "tau32 = fatjet_tau3/fatjet_tau2\n", + "\n", + "#cut_fatjet = (tau32>0.67) & (fatjet_eta>-2.4) & (fatjet_eta<2.4) & (fatjet_mSD>105) & (fatjet_mSD<220)\n", + "cut_fatjet = (fatjet_pt > 500) & (fatjet_tag > 0.5)\n", + "\n", + "cut_muon = (muon_pt>55) & (muon_eta>-2.4) & (muon_eta<2.4) & \\\n", + " (muon_tightId == True) & (muon_iso>1) & (ht_lep>150)\n", + "\n", + "cut_jet = (jet_btag > 0.5) & (jet_jetid>=4)\n", + "\n", + "\n", + "\n", + "# Event cut\n", + "cut_met = (met_pt > 50)\n", + "\n", + "cut_nmuons = ak.num(cut_muon[cut_muon]) == 1\n", + "\n", + "cut_trigger = (events['HLT_TkMu50'].array())\n", + "\n", + "\n", + "cut_ntop = ak.num(cut_fatjet[cut_fatjet]) == 1\n", + "\n", + "cut_full_event = cut_trigger & cut_nmuons & cut_met & cut_ntop# & cut_ht" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05869c14-9078-4f34-87d2-863764b802d7", + "metadata": {}, + "outputs": [], + "source": [ + "#events['Muon_pt'].array()[cut_met]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a0d41afc-71ac-4dd8-bf71-f82afd62a2e7", + "metadata": {}, + "outputs": [], + "source": [ + "#cutn_muons" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c9682f35-67c9-42d9-9cb0-1929ea0cc7fd", + "metadata": {}, + "outputs": [], + "source": [ + "#cut_trigger = events['HLT_TkMu50'].array()\n", + "\n", + "#met_pt = events['PuppiMET_pt'].array()\n", + "#cut_met = met_pt > 50\n", + "\n", + "#cut_full_event = cut_trigger & cut_met\n", + "\n", + "'''\n", + "fatjet_pt = events['FatJet_pt'].array()\n", + "fatjet_eta = events['FatJet_eta'].array()\n", + "fatjet_phi = events['FatJet_phi'].array()\n", + "fatjet_mass = events['FatJet_mass'].array()\n", + "\n", + "muon_pt = events['Muon_pt'].array()\n", + "muon_eta = events['Muon_eta'].array()\n", + "muon_phi = events['Muon_phi'].array()\n", + "muon_mass = events['Muon_mass'].array()\n", + "\n", + "jet_pt = events['Jet_pt'].array()\n", + "jet_eta = events['Jet_eta'].array()\n", + "jet_phi = events['Jet_phi'].array()\n", + "jet_mass = events['Jet_mass'].array()\n", + "\n", + "met_pt = events['PuppiMET_pt'].array()\n", + "met_eta = 0*events['PuppiMET_pt'].array() # Fix this to be 0\n", + "met_phi = events['PuppiMET_phi'].array() \n", + "met_energy = events['PuppiMET_sumEt'].array() \n", + "'''" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ebd9aa97-49e5-47aa-80d1-4ad9c056e034", + "metadata": {}, + "outputs": [], + "source": [ + "#cut_trigger\n", + "#cut_met" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f7c2c0d1-e906-424f-9dea-28627df29bfa", + "metadata": {}, + "outputs": [], + "source": [ + "fatjets = ak.zip(\n", + " {\"pt\": fatjet_pt[cut_full_event][cut_fatjet[cut_full_event]], \n", + " \"eta\": fatjet_eta[cut_full_event][cut_fatjet[cut_full_event]], \n", + " \"phi\": fatjet_phi[cut_full_event][cut_fatjet[cut_full_event]], \n", + " \"mass\": fatjet_mass[cut_full_event][cut_fatjet[cut_full_event]]},\n", + " with_name=\"Momentum4D\",\n", + ")\n", + "\n", + "muons = ak.zip(\n", + " {\"pt\": muon_pt[cut_full_event][cut_muon[cut_full_event]], \n", + " \"eta\": muon_eta[cut_full_event][cut_muon[cut_full_event]], \n", + " \"phi\": muon_phi[cut_full_event][cut_muon[cut_full_event]], \n", + " \"mass\": muon_mass[cut_full_event][cut_muon[cut_full_event]]},\n", + " with_name=\"Momentum4D\",\n", + ")\n", + "\n", + "jets = ak.zip(\n", + " {\"pt\": jet_pt[cut_full_event][cut_jet[cut_full_event]], \n", + " \"eta\": jet_eta[cut_full_event][cut_jet[cut_full_event]], \n", + " \"phi\": jet_phi[cut_full_event][cut_jet[cut_full_event]], \n", + " \"mass\": jet_mass[cut_full_event][cut_jet[cut_full_event]]},\n", + " with_name=\"Momentum4D\",\n", + ")\n", + "\n", + "met = ak.zip(\n", + " {\"pt\": met_pt[cut_full_event], \n", + " \"eta\": met_eta[cut_full_event], \n", + " \"phi\": met_phi[cut_full_event], \n", + " \"e\": met_energy[cut_full_event]},\n", + " with_name=\"Momentum4D\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "429be9ae-0a5f-4562-99e8-e3dfa1267f4a", + "metadata": {}, + "outputs": [], + "source": [ + "#cut_fatjet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36d44469-802a-4928-8b7a-84648502be04", + "metadata": {}, + "outputs": [], + "source": [ + "#len(cut_met[cut_met])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a66938ba-bedc-4d6c-b109-c0b0e98f386d", + "metadata": {}, + "outputs": [], + "source": [ + "#cut_met" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e358a4e6-30d1-4801-ba61-f1f7525af406", + "metadata": {}, + "outputs": [], + "source": [ + "#cut_met & cut_fatjet" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa6b0a9f-e593-4312-86a2-4c5fd9bc0dba", + "metadata": {}, + "outputs": [], + "source": [ + "#fatjets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7f36da78-fc3c-48d4-a09d-a56aec3e1bc5", + "metadata": {}, + "outputs": [], + "source": [ + "#muons" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e3884b6b-6b5d-44c5-8267-e4d02deef4fe", + "metadata": {}, + "outputs": [], + "source": [ + "#jets" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "69293f72-5e29-4a2a-a0e8-5583d735e25a", + "metadata": {}, + "outputs": [], + "source": [ + "#met" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "81c90e81-1e68-4cab-bc21-9ea3093abf01", + "metadata": {}, + "outputs": [], + "source": [ + "#p4mu = ak.unzip(ak.combinations(muons,1))\n", + "#p4fj = ak.unzip(ak.combinations(fatjets,1))\n", + "\n", + "#p4mu1,p4mu2 = ak.unzip(ak.combinations(muons,2))\n", + "#p4fj1,p4fj2 = ak.unzip(ak.combinations(fatjets,2))\n", + "\n", + "p4mu,p4fj,p4j,p4met = ak.unzip(ak.cartesian([muons, fatjets, jets, met]))\n", + "\n", + "# Because these are only 1 we need at a time, we handle them differently\n", + "#p4mu = ak.unzip(ak.zip((muons,)))\n", + "#p4fj = ak.unzip(ak.zip((fatjets,)))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cae989a2-d6b5-4b1f-90ff-487704c05b78", + "metadata": {}, + "outputs": [], + "source": [ + "#p4j\n", + "#p4met" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aafd99a2-2b41-4738-b2ad-8db6ac411d46", + "metadata": {}, + "outputs": [], + "source": [ + "p4tot = p4mu + p4fj + p4j + p4met\n", + "#p4tot = p4mu1 + p4mu2\n", + "\n", + "#p4tot = ak.cartesian([p4mu,p4fj])\n", + "#p4tot = ak.cartesian([muons, fatjets])\n", + "\n", + "#m = ak.unzip(p4tot).mass\n", + "#x = ak.unzip(p4tot)\n", + "\n", + "#plt.hist(ak.unflatten(m),bins=50);" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "92f5140c-34da-4917-be83-33170308854b", + "metadata": {}, + "outputs": [], + "source": [ + "n = 5\n", + "\n", + "print('mu')\n", + "print(p4mu[n].pt, p4mu[n].x, p4mu[n].y, p4mu[n].z, p4mu[n].e)\n", + "print('fatjet')\n", + "print(p4fj[n].pt, p4fj[n].x, p4fj[n].y, p4fj[n].z, p4fj[n].e)\n", + "print('jet')\n", + "print(p4j[n].pt, p4j[n].x, p4j[n].y, p4j[n].z, p4j[n].e)\n", + "print('met')\n", + "print(p4met[n].pt, p4met[n].x, p4met[n].y, p4met[n].z, p4met[n].e)\n", + "print('tot')\n", + "print(p4tot[n].pt, p4tot[n].x, p4tot[n].y, p4tot[n].z, p4tot[n].e, p4tot[n].m)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9a66ab86-2515-4a93-9b03-4f6a09c0c4f3", + "metadata": {}, + "outputs": [], + "source": [ + "n0 = 0\n", + "n1 = 10\n", + "\n", + "print(p4mu[n0:n1])\n", + "print(p4fj[n0:n1])\n", + "print(p4j[n0:n1])\n", + "print(p4met[n0:n1])\n", + "print(p4tot[n0:n1])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "efe5066d-349e-4918-aeef-2c6576d8c85a", + "metadata": {}, + "outputs": [], + "source": [ + "#p4fj[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d209285-5c14-4bb7-80d5-bb26d875c619", + "metadata": {}, + "outputs": [], + "source": [ + "#muons#[0][0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4f97640f-1342-4280-83df-73cccce484a0", + "metadata": {}, + "outputs": [], + "source": [ + "p4tot" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34d0f5e5-d085-4d66-af87-2f7ada8f25f0", + "metadata": {}, + "outputs": [], + "source": [ + "#p4tot.mass" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "354a3981-542b-4f57-aefd-6db90f2df1c4", + "metadata": {}, + "outputs": [], + "source": [ + "plt.hist(ak.flatten(p4tot.mass),bins=50, range=(0,7000));" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a89d9b99-0bce-4ef8-a264-4aa2d5f78d4a", + "metadata": {}, + "outputs": [], + "source": [ + "mydict = {}\n", + "mydict['mtt'] = ak.flatten(p4tot.mass) \n", + "mydict['mu_pt'] = ak.flatten(p4mu.pt) \n", + "\n", + "df = pd.DataFrame.from_dict(mydict)\n", + "\n", + "\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1c6de908-55d9-4e8e-916e-2601f70d24c0", + "metadata": {}, + "outputs": [], + "source": [ + "cut_event_level = events['PuppiMET_pt'].array() > 50\n", + "\n", + "muon_pt = events['Muon_pt'].array()\n", + "\n", + "cut_muon = muon_pt > 35\n", + "\n", + "selected_muons = muon_pt[cut_event_level][cut_muon[cut_event_level]]\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66bfb64a-14b5-456e-ad68-a40e9d56da04", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "27748ffc-67b1-47ed-8b22-d59049fbe249", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8e564174-ba68-4cb2-8af0-dab51a739761", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "003c13a5-e962-4834-bb96-2ad819d43fd8", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "26c1d381-9c01-4af4-a9af-d151d269c28f", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c65d08ca-a53b-4471-9773-cc12c1d370b0", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47c6c161-8081-45e1-97f2-e6536698e1b9", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/fig/MiniAODTable.png b/fig/MiniAODTable.png new file mode 100644 index 0000000..74f3505 Binary files /dev/null and b/fig/MiniAODTable.png differ diff --git a/fig/ZprimeODP-results.png b/fig/ZprimeODP-results.png new file mode 100644 index 0000000..a432fa9 Binary files /dev/null and b/fig/ZprimeODP-results.png differ diff --git a/fig/ZprimeODP-results2.png b/fig/ZprimeODP-results2.png new file mode 100644 index 0000000..ae4c6c8 Binary files /dev/null and b/fig/ZprimeODP-results2.png differ diff --git a/fig/ZprimeODP.png b/fig/ZprimeODP.png new file mode 100644 index 0000000..4262b1e Binary files /dev/null and b/fig/ZprimeODP.png differ diff --git a/fig/ZprimeToTT-results.png b/fig/ZprimeToTT-results.png new file mode 100644 index 0000000..162e09e Binary files /dev/null and b/fig/ZprimeToTT-results.png differ diff --git a/index.md b/index.md new file mode 100644 index 0000000..af66276 --- /dev/null +++ b/index.md @@ -0,0 +1,9 @@ +--- +site: sandpaper::sandpaper_site +--- + +This is a new lesson built with [The Carpentries Workbench][workbench]. + + +[workbench]: https://carpentries.github.io/sandpaper-docs + diff --git a/instructor-notes.md b/instructor-notes.md new file mode 100644 index 0000000..d9a67aa --- /dev/null +++ b/instructor-notes.md @@ -0,0 +1,5 @@ +--- +title: 'Instructor Notes' +--- + +This is a placeholder file. Please add content here. diff --git a/learner-profiles.md b/learner-profiles.md new file mode 100644 index 0000000..434e335 --- /dev/null +++ b/learner-profiles.md @@ -0,0 +1,5 @@ +--- +title: FIXME +--- + +This is a placeholder file. Please add content here. diff --git a/links.md b/links.md new file mode 100644 index 0000000..4c5cd2f --- /dev/null +++ b/links.md @@ -0,0 +1,10 @@ + + +[pandoc]: https://pandoc.org/MANUAL.html +[r-markdown]: https://rmarkdown.rstudio.com/ +[rstudio]: https://www.rstudio.com/ +[carpentries-workbench]: https://carpentries.github.io/sandpaper-docs/ + diff --git a/md5sum.txt b/md5sum.txt new file mode 100644 index 0000000..c31e64b --- /dev/null +++ b/md5sum.txt @@ -0,0 +1,14 @@ +"file" "checksum" "built" "date" +"CODE_OF_CONDUCT.md" "5c4d86d98d42e29a21d1bf7408bfcb56" "site/built/CODE_OF_CONDUCT.md" "2024-06-19" +"LICENSE.md" "f9ad111d7060980f6a15572ff849dbc6" "site/built/LICENSE.md" "2024-06-19" +"config.yaml" "ca0c1d383008c0511dbd28fe4963d528" "site/built/config.yaml" "2024-07-23" +"index.md" "a02c9c785ed98ddd84fe3d34ddb12fcd" "site/built/index.md" "2024-06-19" +"links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-06-19" +"episodes/01-introduction.md" "071ce7b92b927ab433dce73a8449fd9a" "site/built/01-introduction.md" "2024-07-23" +"episodes/02-nanoaod-miniaod.md" "19f80ef488f8bbb48b2619f4ef84ad15" "site/built/02-nanoaod-miniaod.md" "2024-07-24" +"episodes/03-nanoaod-dataset.md" "286712650ebb187b2c69f5aa00492a37" "site/built/03-nanoaod-dataset.md" "2024-07-23" +"episodes/04-nanoaod-exercises.md" "2bc567228a634cb85f32cf1838941cb3" "site/built/04-nanoaod-exercises.md" "2024-07-23" +"instructors/instructor-notes.md" "cae72b6712578d74a49fea7513099f8c" "site/built/instructor-notes.md" "2024-06-19" +"learners/reference.md" "1c7cc4e229304d9806a13f69ca1b8ba4" "site/built/reference.md" "2024-06-19" +"learners/setup.md" "dc5c6fe3c7b1422c68ec0c387bf6b3a1" "site/built/setup.md" "2024-07-23" +"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-06-19" diff --git a/reference.md b/reference.md new file mode 100644 index 0000000..ba26b9f --- /dev/null +++ b/reference.md @@ -0,0 +1,8 @@ +--- +title: 'Reference' +--- + +## Glossary + +This is a placeholder file. Please add content here. + diff --git a/setup.md b/setup.md new file mode 100644 index 0000000..d7027bc --- /dev/null +++ b/setup.md @@ -0,0 +1,15 @@ +--- +title: Summary and Setup +--- + +:::::: prereq + +You will be using what you learned in the ["Docker containers"](https://cms-opendata-workshop.github.io/workshop2024-lesson-docker/instructor/index.html) and ["Open Data analysis in C++ and Python"](https://cms-opendata-workshop.github.io/workshop2024-lesson-cpp-root-python/instructor/index.html) pre-exercises. + +In addition you should have gone through the ["Find and using open data""](https://cms-opendata-workshop.github.io/workshop2024-lesson-dataset-scouting/instructor/index.html) pre-exercise. + +:::::: + + + +