Skip to content

Commit

Permalink
feat: add validation workflow from MC & add 2023 files (#91)
Browse files Browse the repository at this point in the history
* test file for QCD

* QCD Pt binned samples for QCD workflow

* Data samples for QCD workflow

* add QCD workflow selection

* add UL18 files for QCD

* add histograms for QCD workflow

* PU weights for QCD workflow

* HLT_PFJet140 prescales

* add variables for QCD workflow

* add QCD Pt binned samples xsection

* QCD workflow, need debug

* add QCD workflow

* add QCD plotting

* prescales for HLT_PFJet140

* script to prepare json file for prescales

* csv file contains HLT_PFJet140 prescales information

* fix : add new files & change tagger axis  & minor fixes
- metadata : modified file list
- data: add jetveto ap
- wf : remove outlier config
- scripts: fixes for plotting code for comparison plot

* feat: add condor resubmission scripts & fix DY hists

* feat : add validation workflow & ROC/efficiency script

* fix: correction

* fix: correction

* fix: correction implementation & add pu info

* fix: add pv hists

* fix:axis

* fix:minor

* fix: compatible with current changes

* fix: working QCD

* feat: fixed QCD

* feat: add veto

* feat: add Summer23 info

* fix : format

* feat : change JEC implementation

* fix : workflow

* fix : remove dependency python 3.8

* feat : minor fixes

* feat: add xsection

---------

Co-authored-by: hhsia <[email protected]>
Co-authored-by: uttiyasarkar <[email protected]>
Co-authored-by: pmatorras <[email protected]>
Co-authored-by: hsinweihsia <[email protected]>
  • Loading branch information
5 people authored Mar 28, 2024
1 parent f69a3b4 commit 70f87f1
Show file tree
Hide file tree
Showing 381 changed files with 79,444 additions and 15,844 deletions.
7 changes: 4 additions & 3 deletions .github/workflows/BTA_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
strategy:
max-parallel: 4
matrix:
python-version: ["3.8","3.10"]
python-version: ["3.10"]

defaults:
run:
Expand Down Expand Up @@ -87,6 +87,7 @@ jobs:
chmod 400 $HOME/.globus/userkey.pem
openssl rand -out $HOME/.rnd -hex 256
printf "${{secrets.GRID_PASSWORD}}" | voms-proxy-init --voms cms --vomses ${X509_VOMSES} --debug --pwstdin
chmod 755 /usr/share/miniconda3/envs/btv_coffea/etc/grid-security/certificates
- name: Test xrootd
run: |
Expand All @@ -98,9 +99,9 @@ jobs:
- name: BTA workflow test
run: |
python runner.py --wf BTA --json metadata/test_bta_run3.json --campaign Summer22Run3 --executor iterative --overwrite
python runner.py --wf BTA --json metadata/test_bta_run3.json --executor iterative --overwrite
- name: BTA_ttbar workflow test
run: |
python runner.py --wf BTA_ttbar --json metadata/test_bta_run3.json --campaign Summer22Run3 --executor iterative --overwrite
python runner.py --wf BTA_ttbar --json metadata/test_bta_run3.json --executor iterative --overwrite
7 changes: 4 additions & 3 deletions .github/workflows/ctag_DY_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
strategy:
max-parallel: 4
matrix:
python-version: ["3.8","3.10"]
python-version: ["3.10"]

defaults:
run:
Expand Down Expand Up @@ -86,6 +86,7 @@ jobs:
chmod 400 $HOME/.globus/userkey.pem
openssl rand -out $HOME/.rnd -hex 256
printf "${{secrets.GRID_PASSWORD}}" | voms-proxy-init --voms cms --vomses ${X509_VOMSES} --debug --pwstdin
chmod 755 /usr/share/miniconda3/envs/btv_coffea/etc/grid-security/certificates
- name: Test xrootd
run: |
Expand All @@ -109,7 +110,7 @@ jobs:
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow ctag_DY_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer22Run3 --year 2022 $opts
python runner.py --workflow ctag_DY_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative $opts

- name: ctag electron DY workflows with correctionlib
Expand All @@ -125,4 +126,4 @@ jobs:
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow ectag_DY_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer22Run3 --year 2022 $opts
python runner.py --workflow ectag_DY_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative $opts
8 changes: 4 additions & 4 deletions .github/workflows/ctag_Wc_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
strategy:
max-parallel: 4
matrix:
python-version: ["3.8","3.10"]
python-version: ["3.10"]

defaults:
run:
Expand Down Expand Up @@ -90,7 +90,7 @@ jobs:
chmod 400 $HOME/.globus/userkey.pem
openssl rand -out $HOME/.rnd -hex 256
printf "${{secrets.GRID_PASSWORD}}" | voms-proxy-init --voms cms --vomses ${X509_VOMSES} --debug --pwstdin
chmod 755 /usr/share/miniconda3/envs/btv_coffea/etc/grid-security/certificates
- name: Test xrootd
run: |
xrdcp root://eoscms.cern.ch//eos/cms/store/group/phys_btag/nano-commissioning/test_w_dj.root .
Expand All @@ -113,7 +113,7 @@ jobs:
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow ctag_Wc_sf --json metadata/test_bta_run3.json --campaign Summer22Run3 --executor iterative $opts
python runner.py --workflow ctag_Wc_sf --json metadata/test_bta_run3.json --executor iterative $opts
- name: ctag electron W+c workflows with correctionlib
run: |
Expand All @@ -128,4 +128,4 @@ jobs:
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow ectag_Wc_sf --json metadata/test_bta_run3.json --executor iterative --campaign Summer22Run3 --overwrite $opts
python runner.py --workflow ectag_Wc_sf --json metadata/test_bta_run3.json --executor iterative --overwrite $opts
13 changes: 7 additions & 6 deletions .github/workflows/ctag_ttbar_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
strategy:
max-parallel: 4
matrix:
python-version: ["3.8","3.10"]
python-version: ["3.10"]

defaults:
run:
Expand Down Expand Up @@ -87,6 +87,7 @@ jobs:
chmod 400 $HOME/.globus/userkey.pem
openssl rand -out $HOME/.rnd -hex 256
printf "${{secrets.GRID_PASSWORD}}" | voms-proxy-init --voms cms --vomses ${X509_VOMSES} --debug --pwstdin
chmod 755 /usr/share/miniconda3/envs/btv_coffea/etc/grid-security/certificates
- name: Test xrootd
run: |
Expand All @@ -110,7 +111,7 @@ jobs:
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow ctag_ttsemilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer22Run3 $opts
python runner.py --workflow ctag_ttsemilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative $opts
- name: ctag semileptonic electron ttbar workflows with correctionlib
run: |
Expand All @@ -125,7 +126,7 @@ jobs:
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow ectag_ttsemilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer22Run3 $opts
python runner.py --workflow ectag_ttsemilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative $opts
- name: ctag dileptonic muon ttbar workflows with correctionlib
run: |
Expand All @@ -140,7 +141,7 @@ jobs:
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow ctag_ttdilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer22Run3 $opts
python runner.py --workflow ctag_ttdilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative $opts
- name: ctag dileptonic electron ttbar workflows with correctionlib
run: |
string=$(git log -1 --pretty=format:'%s')
Expand All @@ -154,7 +155,7 @@ jobs:
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow ectag_ttdilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer22Run3 $opts
python runner.py --workflow ectag_ttdilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative $opts
- name: ctag dileptonic emu ttbar workflows with correctionlib
run: |
string=$(git log -1 --pretty=format:'%s')
Expand All @@ -168,5 +169,5 @@ jobs:
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow emctag_ttdilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer22Run3 $opts
python runner.py --workflow emctag_ttdilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative $opts
8 changes: 4 additions & 4 deletions .github/workflows/ttbar_workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
strategy:
max-parallel: 4
matrix:
python-version: ["3.8","3.10"]
python-version: ["3.10"]

defaults:
run:
Expand Down Expand Up @@ -87,7 +87,7 @@ jobs:
chmod 400 $HOME/.globus/userkey.pem
openssl rand -out $HOME/.rnd -hex 256
printf "${{secrets.GRID_PASSWORD}}" | voms-proxy-init --voms cms --vomses ${X509_VOMSES} --debug --pwstdin
chmod 755 /usr/share/miniconda3/envs/btv_coffea/etc/grid-security/certificates
- name: Test xrootd
run: |
xrdcp root://eoscms.cern.ch//eos/cms/store/group/phys_btag/nano-commissioning/test_w_dj.root .
Expand All @@ -109,7 +109,7 @@ jobs:
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow ttsemilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer22Run3 $opts
python runner.py --workflow ttsemilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative $opts

- name: btag dileptonic ttbar workflows with correctionlib
Expand All @@ -125,5 +125,5 @@ jobs:
elif [[ $string == *"ci:weight_only"* ]]; then
opts=$(echo "$opts" | sed 's/--isSyst all/--isSyst weight_only/g')
fi
python runner.py --workflow ttdilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer22Run3 $opts
python runner.py --workflow ttdilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative $opts
37 changes: 23 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ along with the fileset these should run over. Multiple executors can be chosen

To test a small set of files to see whether the workflows run smoothly, run:
```
python runner.py --workflow ttsemilep_sf --json metadata/test_bta_run3.json --campaign Summer22EERun3 --year 2022
python runner.py --workflow ttsemilep_sf --json metadata/test_bta_run3.json --campaign Summer23 --year 2023
```

More options for `runner.py`
Expand All @@ -73,7 +73,7 @@ More options for `runner.py`
(default: dummy_samples.json)
--year YEAR Year
--campaign CAMPAIGN Dataset campaign, change the corresponding correction
files{ "Rereco17_94X","Winter22Run3","Summer22Run3","Summer22EERun3","2018_UL","2017_UL","2016preVFP_UL","2016postVFP_UL"}
files{ "Rereco17_94X","Winter22Run3","Summer23","Summer23BPix","Summer22","Summer22EE","2018_UL","2017_UL","2016preVFP_UL","2016postVFP_UL"}
--isSyst Run with systematics, all, weight_only(no JERC uncertainties included),JERC_split, None(not extract)
--isArray Output root files
--noHist Not save histogram coffea files
Expand Down Expand Up @@ -145,13 +145,13 @@ After a small test, you can run the full campaign for a dedicated phase space, s
- Dileptonic ttbar phase space : check performance for btag SFs, emu channel

```
python runner.py --workflow ttdilep_sf --json metadata/data_Summer22_Run3_2022_em_BTV_Run3_2022_Comm_MINIAODv4_NanoV12.json --campaign Summer22Run3 --year 2022 (--executor ${scaleout_site})
python runner.py --workflow ttdilep_sf --json metadata/data_Summer23_2023_em_BTV_Run3_2023_Comm_MINIAODv4_NanoV12.json --campaign Summer23 --year 2023 (--executor ${scaleout_site})
```

- Semileptonic ttbar phase space : check performance for btag SFs, muon channel

```
python runner.py --workflow ttsemilep_sf --json metadata/data_Summer22_Run3_2022_mu_BTV_Run3_2022_Comm_MINIAODv4_NanoV12.json --campaign Summer22Run3 --year 2022 (--executor ${scaleout_site})
python runner.py --workflow ttsemilep_sf --json metadata/data_Summer23_2023_mu_BTV_Run3_2023_Comm_MINIAODv4_NanoV12.json --campaign Summer23 --year 2023 (--executor ${scaleout_site})
```

</p>
Expand All @@ -164,26 +164,26 @@ python runner.py --workflow ttsemilep_sf --json metadata/data_Summer22_Run3_2022
- Dileptonic ttbar phase space : check performance for charm SFs, bjets enriched SFs, muon channel

```
python runner.py --workflow ctag_ttdilep_sf --json metadata/data_Summer22_2022_mu_BTV_Run3_2022_Comm_MINIAODv4_NanoV12.json --campaign Summer22Run3 --year 2022(--executor ${scaleout_site})
python runner.py --workflow ctag_ttdilep_sf --json metadata/data_Summer23_2023_mu_BTV_Run3_2023_Comm_MINIAODv4_NanoV12.json --campaign Summer23 --year 2023(--executor ${scaleout_site})
```


- Semileptonic ttbar phase space : check performance for charm SFs, bjets enriched SFs, muon channel

```
python runner.py --workflow ctag_ttsemilep_sf --json metadata/data_Summer22_2022_mu_BTV_Run3_2022_Comm_MINIAODv4_NanoV12.json --campaign Summer22Run3 --year 2022(--executor ${scaleout_site})
python runner.py --workflow ctag_ttsemilep_sf --json metadata/data_Summer23_2023_mu_BTV_Run3_2023_Comm_MINIAODv4_NanoV12.json --campaign Summer23 --year 2023(--executor ${scaleout_site})
```

- W+c phase space : check performance for charm SFs, cjets enriched SFs, muon channel

```
python runner.py --workflow ctag_Wc_sf --json metadata/data_Summer22_2022_mu_BTV_Run3_2022_Comm_MINIAODv4_NanoV12.json --campaign Summer22Run3 --year 2022(--executor ${scaleout_site})
python runner.py --workflow ctag_Wc_sf --json metadata/data_Summer23_2023_mu_BTV_Run3_2023_Comm_MINIAODv4_NanoV12.json --campaign Summer23 --year 2023(--executor ${scaleout_site})
```

- DY phase space : check performance for charm SFs, light jets enriched SFs, muon channel

```
python runner.py --workflow ctag_DY_sf --json metadata/data_Summer22_2022_mu_BTV_Run3_2022_Comm_MINIAODv4_NanoV12.json --campaign Summer22Run3 --year 2022(--executor ${scaleout_site})
python runner.py --workflow ctag_DY_sf --json metadata/data_Summer23_2023_mu_BTV_Run3_2023_Comm_MINIAODv4_NanoV12.json --campaign Summer23 --year 2023(--executor ${scaleout_site})
```

</p>
Expand All @@ -208,24 +208,24 @@ python runner.py --workflow valid --json metadata/$json file

Based on Congqiao's [development](notebooks/BTA_array_producer.ipynb) to produce BTA ntuples based on PFNano.

:exclamation: Only the newest version [BTV_Run3_2022_Comm_MINIAODv4](https://github.com/cms-btv-pog/btvnano-prod) ntuples work. Example files are given in [this](metadata/test_bta_run3.json) json. Optimize the chunksize(`--chunk`) in terms of the memory usage. This depends on sample, if the sample has huge jet collection/b-c hardons. The more info you store, the more memory you need. I would suggest to test with `iterative` to estimate the size.
:exclamation: Only the newest version [BTV_Run3_2023_Comm_MINIAODv4](https://github.com/cms-btv-pog/btvnano-prod) ntuples work. Example files are given in [this](metadata/test_bta_run3.json) json. Optimize the chunksize(`--chunk`) in terms of the memory usage. This depends on sample, if the sample has huge jet collection/b-c hardons. The more info you store, the more memory you need. I would suggest to test with `iterative` to estimate the size.

<details><summary>details</summary>
<p>

Run with the nominal `BTA` workflow to include the basic event variables, jet observables, and GEN-level quarks, hadrons, leptons, and V0 variables.
```
python runner.py --wf BTA --json metadata/test_bta_run3.json --campaign Summer22EERun3 --isJERC
python runner.py --wf BTA --json metadata/test_bta_run3.json --campaign Summer22EE --isJERC
```

Run with the `BTA_addPFMuons` workflow to additionally include the `PFMuon` and `TrkInc` collection, used by the b-tag SF derivation with the QCD(μ) methods.
```
python runner.py --wf BTA_addPFMuons --json metadata/test_bta_run3.json --campaign Summer22EERun3 --isJERC
python runner.py --wf BTA_addPFMuons --json metadata/test_bta_run3.json --campaign Summer22EE --isJERC
```

Run with the `BTA_addAllTracks` workflow to additionally include the `Tracks` collection, used by the JP variable calibration.
```
python runner.py --wf BTA_addAllTracks --json metadata/test_bta_run3.json --campaign Summer22EERun3 --isJERC
python runner.py --wf BTA_addAllTracks --json metadata/test_bta_run3.json --campaign Summer22EE --isJERC
```

</p>
Expand All @@ -239,6 +239,10 @@ However, some sites have certain restrictions for various reasons, in particular


Memory usage is also useful to adapt to cluster. Check the memory by calling `memory_usage_psutil()` from `helpers.func.memory_usage_psutil` to optimize job size. Example with `ectag_Wc_sf` summarized below.

<details><summary>details</summary>
<p>

Type |Array+Hist | Hist only| Array Only|
| :---: | :---: | :---: | :---: |
DoubleMuon (BTA,BTV_Comm_v2)| 1243MB | 848MB |1249MB|
Expand All @@ -248,6 +252,8 @@ WJets_inc (BTA,BTV_Comm_v2)| 1243MB |848MB |1249MB|
WJets_inc (PFCands, BTV_Comm_v1)|1650MB |1274MB |1632MB
WJets_inc (Nano_v11)|1183MB |630MB |1180MB|

</p>
</details>

### Sites configuration with dask/parsl schedular

Expand Down Expand Up @@ -318,6 +324,8 @@ This utility is currently adapted for the lxplus and cmsconnect condor systems.

After executing the command, a new folder will be created, preparing the submission. Follow the on-screen instructions and utilize `condor_submit ...` to submit the jdl file. The output will be transferred to the designated XRootD destination.

The script provided by Pablo to resubmit failure jobs in `script/missingFiles.py` from the original job folder.

<details><summary>Frequent issues for standalone condor jobs submission
</summary>
<p>
Expand All @@ -333,6 +341,7 @@ After executing the command, a new folder will be created, preparing the submiss
</details>



## Make the dataset json files

Use `fetch.py` in folder `scripts/` to obtain your samples json files. You can create `$input_list` ,which can be a list of datasets taken from CMS DAS , and create the json contains `dataset_name:[filelist]`. One can specify the local path in that input list for samples not published in CMS DAS.
Expand Down Expand Up @@ -481,7 +490,7 @@ Compile correction pickle files for a specific JEC campaign by changing the dict

```
python -m BTVNanoCommissioning.utils.compile_jec ${campaign} jec_compiled
e.g. python -m BTVNanoCommissioning.utils.compile_jec Summer22Run3 jec_compiled
e.g. python -m BTVNanoCommissioning.utils.compile_jec Summer23 jec_compiled
```


Expand Down Expand Up @@ -645,7 +654,7 @@ Yout can find the secret configuration in the direcotry : `Settings>>Secrets>>Ac
Special commit head messages could run different commands in actions (add the flag in front of your commit)
The default configureation is doing
```
python runner.py --workflow emctag_ttdilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer22Run3 --isArray --isSyst all
python runner.py --workflow emctag_ttdilep_sf --json metadata/test_bta_run3.json --limit 1 --executor iterative --campaign Summer23 --isArray --isSyst all
```

- `[skip ci]`: not running ci at all in the commit message
Expand Down
5 changes: 4 additions & 1 deletion condor/execute.sh
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,10 @@ OPTS="--wf ${ARGS[workflow]} --year ${ARGS[year]} --campaign ${ARGS[campaign]} -
if [ "${ARGS[voms]}" != "null" ]; then
OPTS="$OPTS --voms ${ARGS[voms]}"
fi
for key in isSyst isArray noHist overwrite skipbadfiles; do
if [ "${ARGS[isSyst]}" != "null" ]; then
OPTS="$OPTS --isSyst ${ARGS[isSyst]}"
fi
for key in isArray noHist overwrite skipbadfiles; do
if [ "${ARGS[$key]}" == true ]; then
OPTS="$OPTS --$key"
fi
Expand Down
Loading

0 comments on commit 70f87f1

Please sign in to comment.