Skip to content

UMN-CMS/SingleStopCoffea

Repository files navigation

Introduction

This the repository for the single-stop analyzer, used by the UMN-CMS group's single-stop analysis group. This repository contains a python package designed to allow the definition and execution the single-stop analysis. This includes

  • Definitions of all used dataset
  • Region definitions
  • Descriptions of histograms and other analysis artifacts.
  • Handling of MC weights.
  • Handling of systematics, both for scale and shape.
  • Automatic scale-out with dask and condor
  • Postprocessing utilities for creating plots, scale factors, and more

Installation

To begin, clone the repository to your desired location

git clone [email protected]:UMN-CMS/SingleStopCoffea.git

Then follow the instructions to get set up.

On a system with CVMFS

If you have access to CVMFS, the easiest way to get started is to simple run

source setup.sh

This will run a setup script that will create a complete environment, and use this same environment is used on worker nodes.

If this is the first time you have run the analyzer, you will also to populate the replica cache using. This will query rucio to find the physical location of files based on the path names.

analyzer generate-replicas

This may take some time as we find all locations for the files in our datasets.

Running the Analyzer

You can run the complete analysis in a single command:

python3 -m analyzer run configurations/<YOUR_CONFIG>.yaml -o results/my_results_file.pkl -e <EXECUTOR_CHOICE>

This will run the analysis defined by the configuration file <YOUR_CONFIG>.yaml using the chosen executor Of course, this will be very slow, since the complete analysis is processing billions of events. You can speed things up by specifying a distributed computation system. To run on condor, using 100 workers, each with 4GB you can run

analyzer run-analysis configurations/single_stop_complete.yaml -o results/my_results_file.pkl -t lpccondor -w 100 -m 4GB

While developing, you may instead want to use a different configuration

analyzer run-analysis configurations/my_personal_configuration.yaml -o results/my_results_file.pkl -t local

Inspecting and Processing Results

Processing is done using the post-processor subcommand.

analyzer post-process configuration/<YOUR_CONFIGURATION>.yaml

The post-processor will automatically handle things like scaling of MC histograms. Like the main analysis, post-processing is configuration driven. The configuraiton yaml defines what histograms and other graphs should be produced, how they should be saved, etc.

Configuration In Depth

Configuration files are yaml files that describe what you want to run. The general format

(call-process-region (point-min) (point-max) "pandoc")

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •