Skip to content

Quick start guide

Christophe Delaere edited this page Feb 13, 2019 · 3 revisions

The first think you have to do after downloading the latest SAMADhi release is to set the default connection details in ~/.samadhi as described in https://github.com/cp3-llbb/SAMADhi/wiki/Installation. Edit the file to set the login, password and database server as indicated by the system administrators. In CP3, a central database is accessible on ingrid.

Once done, you can start populating/interrogating the database.

1. Importing datasets from DAS

To import a dataset from DAS, you can use the das_import.py script. One typical example is ./das_import.py /ZZ_TuneZ2star_8TeV_pythia6_tauola/Summer12_DR53X-PU_S10_START53_V7A-v1/AODSIM --xsection=8.25561

By default, the code will create an entry with most of the information extracted from DAS or from the dataset name. This includes the process name and centre of mass energy. An online help provides all available options.

Datasets are the most basic/primitive objects in SAMADhi. All other samples will somehow point to one or more datasets. Datasets must also be known to identify unique events.

2. Adding a sample

Samples represent derived sets of events obtained from a dataset or from another sample. It includes PAT-tuples, skims, RooDatasets (RDS), LHCO files, ntuples and histos. The add_sample.py script helps to add a sample to the database. For example: ./add_sample.py PAT /nfs/user/llbb/Pat_8TeV_537/Summer12_ZZ_S10 -a delaere --source_dataset=3 will add a sample located as indicated in /nfs/user, will keep track of the author and related it with dataset #3.

Possible options include the number of processed events, number of resulting events in the sample, normalization, luminosity, code version, creation time, etc. When not specified, the luminosity of MC samples will be guessed from the number of processed events and the cross-section of the parent dataset.

3. Documenting results

From one or more sample, one will usually produce various kinds of "results". A result can be a collection of stacked histograms, a cross-section, an exclusion or discovery plot, etc. SAMADhi allows to keep track of these results to share them efficiently with collaborators while keeping track of key ingredients: samples and datasets used on the way to obtain them. Results may be considered as the final product, the end of the chain. The main purpose of their insertion in the database is documentation and dissemination. It replaces the maintenance of an elog or wiki.

To register a result, use the add_result.py script: ./add_result.py /home/storage/bigDiscovery -d"how I will solve all your troubles" -a delaere -s 1,2,3

4. Extending the framework

SAMADhi allows to track information related to datasets, samples and single events. It would be trivial to extend it with additional tables to store more analysis products, like MVA discriminants used in a sample or result, etc. It may require the definition of new concepts like analysis code (could use the github API) or analysis activity.

Clone this wiki locally