-
Notifications
You must be signed in to change notification settings - Fork 17
Implementation: Scale factors
This page explains how scale factors are applied in the framework. In almost all cases, scale factors are read as 1- or 2-dimensional histograms from a ROOT file, with the axes representing input variables such as a lepton pt, and the bin content representing the value of the scale factor.
The implementation relies on two steps.
The configuration file is used to separate specific cut values, input files, etc, from the physics logic, which is implemented in the processors. For general information on this topic, have a look at the configuration wiki page. Scale factors are defined in the sf
section of the config file. Example:
default:
# (...)
sf:
ewk_nlo_w:
histogram: kfactor_monojet_ewk
file: data/sf/theory/merged_kfactors_wjets.root
# (...)
Here, ewk_nlo_w
is an arbitrary name that we use to identify this scale factor within our framework. It has to be unique for each scale factor. For each scale factor, we give the specify the file
and histogram
parameters, which represent the path to the ROOT file that contains the k factor (data/sf/theory/merged_kfactors_wjets.root
) and the name under which the histogram is saved in this file (kfactor_monojet_ewk
). When the processors are run, the list of scale factors in the config is parsed and all k factors are loaded and made available in the evaluator
object used in the processor. The heavy lifting behind the evaluator is implemented in the coffea framework (see here), and configuration parsing is implemented here, but you don't have to worry about either of those for now.
Note that the example config shown here has the sf
section as a subsection of the default
section. This means that the scale factor definition will be used for all years of data taking unless it is overwritten in the year-dependent part of the configuration.
The weights are now available in the evaluator object in the processor. Simply pass the name of the k factor you have defined in the configuration:
evaluator = evaluator_from_config(cfg)
#(...)
my_sf = evaluator["ewk_nlo_w"](gen_v_pt)
In this case, the k factor is defined as a function of the gen-level boson pt, which we pass in the gen_v_pt
variable. You can also have 2-D weights, e.g.:
muon_tight_sf = evaluator['muon_id_tight'](muons[df['is_tight_muon']].pt, muons[df['is_tight_muon']].abseta).prod()
In this case, the additional complication appears that the evaluator will return a separate SF for each tight muon in the event (there may be multiple). To get one weight per event, we simply multiply the weights for all muons in the event (.prod()
).
The many event weights needed in an analysis are organized using the coffea.processor.Weights
class.
We simply need to register our weight in the Weights object:
#(...)
weights.add('gen', df['Generator_weight'])
#(...)
Currently, the weights are added to the Weights object using separate functions defined here. This has the advantage that the functions can be imported into different processors, which reduces code duplication. Some corrections, e.g. theory k factors, should only be applied for some of the samples. This is implemented by storing dataset categorization information in the data frame. For example, df['is_lo_w']
indicates whether or not the dataset corresponds to LO W, df['is_lo_z']
is the same for Z, etc. These category identifiers are defined at run time in the processor.
Based on the information given above, the workflow should be clear:
- Add the ROOT file with the corrections to an appropriate directory in the repository.
- Define the scale factor in the configuration file.
- Access your new weight in the processor and add it to the Weights object, either in an existing function, in a new function, or just in the body of your processor. If your new correction should be applied only to some datasets, remember to check the dataset name / type before applying.
Be careful about the input variables you pass into the evaluator object. Especially for 2-dimensional scale factors you should watch out that you pass the inputs in the right order (x first, then y), otherwise you will get nonsensical results. When adding a new weight, always make a plot before and after adding the weight, so that you can see what effect the weight has. It is generally good practice to save these plots for later reference.