This the repository for the single-stop analyzer, used by the UMN-CMS group's single-stop analysis group. This repository contains a python package designed to allow the definition and execution the single-stop analysis. This includes
- Definitions of all used dataset
- Region definitions
- Descriptions of histograms and other analysis artifacts.
- Handling of MC weights.
- Handling of systematics, both for scale and shape.
- Automatic scale-out with dask and condor
- Postprocessing utilities for creating plots, scale factors, and more
To begin, clone the repository to your desired location
git clone [email protected]:UMN-CMS/SingleStopCoffea.git
Then follow the instructions to get set up.
If you have access to CVMFS, the easiest way to get started is to simple run
source setup.sh
This will run a setup script that will create a complete environment, and use this same environment is used on worker nodes.
If this is the first time you have run the analyzer, you will also to populate the replica cache using. This will query rucio to find the physical location of files based on the path names.
analyzer generate-replicas
This may take some time as we find all locations for the files in our datasets.
You can run the complete analysis in a single command:
python3 -m analyzer run configurations/<YOUR_CONFIG>.yaml -o results/my_results_file.pkl -e <EXECUTOR_CHOICE>
This will run the analysis defined by the configuration file <YOUR_CONFIG>.yaml
using the chosen executor
Of course, this will be very slow, since the complete analysis is processing billions of events.
You can speed things up by specifying a distributed computation system.
To run on condor, using 100 workers, each with 4GB you can run
analyzer run-analysis configurations/single_stop_complete.yaml -o results/my_results_file.pkl -t lpccondor -w 100 -m 4GB
While developing, you may instead want to use a different configuration
analyzer run-analysis configurations/my_personal_configuration.yaml -o results/my_results_file.pkl -t local
Processing is done using the post-processor
subcommand.
analyzer post-process configuration/<YOUR_CONFIGURATION>.yaml
The post-processor will automatically handle things like scaling of MC histograms. Like the main analysis, post-processing is configuration driven. The configuraiton yaml defines what histograms and other graphs should be produced, how they should be saved, etc.
Configuration files are yaml files that describe what you want to run. The general format
(call-process-region (point-min) (point-max) "pandoc")