This repository contains utilities for data analysis packages like Coffea and others. It provides various tools and scripts to facilitate data processing, analysis, and visualization.
- Fixing minor memory leak issue
- Implementing flattened output
- Utilities for Coffea and other data analysis packages
- Data processing scripts
- Analysis and visualization tools
The JobLoader
class is responsible for loading meta job files and preparing them for processing by slicing the files into smaller jobs. It handles the initialization of paths, checking for existing paths, and writing job parameters to JSON files.
The JobRunner
class is responsible for initializing and managing the submission of jobs based on the provided run settings, job file, and event selection class. It supports both synchronous and asynchronous (WIP) job submission using a Dask client.
The Processor
class is designed to handle the processing of datasets, including initialization, directory setup, and remote file loading/transferring.
The BaseEventSelections
class serves as a base class for event selections. It provides a framework for applying various selection criteria to events based on trigger, object, and mapping configurations.
The Object
class is designed for handling object selections and serves as an observer of the events. It provides a framework for managing selection and mapping configurations for different objects such as Electrons, Muons, and Jets.
The weightedCutflow
class extends the Cutflow
class and is designed to handle weighted cutflows. It provides methods for initializing, adding, and retrieving the results of the cutflow.
The weightedSelection
class extends the PackedSelection
class and represents a set of selections on a set of events with weights. It provides methods for adding selections sequentially and generating a weighted cutflow.
The CSVPlotter
class is designed to plot histograms and other visualizations from CSV files. It utilizes various libraries such as mplhep
, matplotlib
, numpy
, and pandas
to create and manage plots. The class is initialized with an output directory and a configuration object for plotting.
The FileSysHelper
class is a utility for performing file system operations on both local and remote file systems. It provides methods to check and create directories, as well as transfer files between directories with various options for handling existing files.
checkpath(pathstr, createdir=True, raiseError=False) -> bool
- Checks if a path exists. Optionally creates the directory or raises an error if the path does not exist.
transfer_files(srcpath, destpath, filepattern='*', remove=False, overwrite=False, **kwargs) -> None
- Transfers files matching a pattern from a source directory to a destination directory. Handles existing files by overwriting, renaming, or skipping them.
To add this repository as a submodule to your existing Git repository, follow these steps:
-
Navigate to the root directory of your Git repository:
cd /path/to/your/repo
-
Add this repository as a submodule:
git submodule add <repository-url> path/to/submodule
-
Initialize and update the submodule:
git submodule update --init --recursive
-
Commit the changes:
git add .gitmodules path/to/submodule git commit -m "Add submodule for Coffea utilities"
To update the submodule to the latest version, navigate to the submodule directory and pull the latest changes:
cd path/to/submodule
git pull origin main