All notable changes to this project will be documented in this file.
Older entries have been generated from github releases. New entries aim to adhere to the format proposed by Keep a Changelog, This project adheres to Semantic Versioning.
#203 @AlexanderHeidelbach
- gbasf2: Fix
gbasf2_setup_path
setting not being passed through in some function calls. - gbasf2: Local basf2 log level setting is now passed over to the grid jobs. You can now limit the log size of jobs with many warnings via
basf2.set_log_level(basf2.LogLevel.ERROR)
. This could fix some errors due to too large log sizes. Implemented by pickling localbasf2.logging.log_level
.
- gbasf2: Change the default gbasf2 setup script path to CVMFS location in gbasf2 v5.8.2, i.e.
Reminder that this can still be customized via the
/cvmfs/belle.kek.jp/grid/gbasf2/pro/bashrc
gbasf2_setup_path
setting. Resolves issue .
- gbasf2: Fully deprecate
gbasf2_install_directory
setting. It will be ignored from now on and a warning given if used. Instead please use thegbasf2_setup_path
setting introduced in v0.10.1 to provide the exact path to the gbasf2 setup script.gbasf2_install_directory
will not be used as a fall-back anymore as was the case in v0.10.1.
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.10.1...main
- gbasf2: New setting
gbasf2_setup_path
which can be used to customize the path to the gbasf2 setup file directly (default:"/cvmfs/belle.kek.jp/grid/gbasf2/pro/tools/setup.sh"
). It is a more flexible replacement for thegbasf2_install_directory
setting, which will be removed in the future, since we can't predict potential name and path changes of the setup script between gbasf2 releases. @meliache #162
-
gbasf2: Fix the issues caused by
gbasf2
releasev5r7
#197. Thanks to @MarcelHoh. -
gbasf2: #197 also includes the removal of the
--new
flag forgb2_ds_get
when proxy group is notbelle
, as then the downloaded directory structure is different from what b2luigi expects. This is a hotfix, in the future we should aim to always use the new download style, therefore issue #200 was opened. -
gbasf2: Switch to the
--new
flag ingb2_ds_get
which downloads files significantly faster than previously. Gbasf2 release v5r6 (November 2022) is required. #190.
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.10.0...v0.10.1
-
For local basf2 versions, change how hash for
basf2_release
Parameter is calculated. Now use basf2 functionality to get the version, to be consistent with the output ofbasf2 --version
. The new hash encodes both the local and central basf2 release, the basf2 functiongetCommitID
. When basf2 is not set up, print warning before returning"not_set"
. Thanks to @GiacomoXT in #193.Warning: If you use local basf2 versions, that is your
basf2_release
is a git hash, this will change your b2luigi target output paths. This means that tasks that were marked complete, might suddenly not be complete anymore after updating to this release. A workaround is to check for the new expected path viapython3 <steering_fname>.py --show_output
and rename thegit_hash=<…>
directory. -
Apply
max_events
Parameter not by changing the environment singleton, but instead forward it tobasf2.process
call. This should hopefully not affect the behaviour in practice. Also by @GiacomoXT in #193 -
Refactor the basf2 related examples to use more idiomatic, modern basf2 code, e.g. using
basf2.Path()
instead ofbasf2.create_path()
. . Also by @GiacomoXT in #193
- Fix example
SimulationTask
task inbasf2_chain_example.py
, which probably wasn't working as it was missing the Geometry module. Also by @GiacomoXT in #193
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.9.1...v0.10.0
- Fix circular import #188
-
Add the ability to pass a custom hashing function to parameters via the
hash_function
keyword argument. The function must take one argument, the value of the parameter. It is up to the user to ensure unique strings are created. #189 -
gbasf2: Switch to the
--new
flag ingb2_ds_get
which downloads files significantly faster than previously. Gbasf2 release v5r6 (November 2022) is required. #190.
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.9.0...v0.9.1
- gbasf2: Fix bug introduced in #181 when generating basf2 queries with just a simple
.root
extension, raising a wrong false positive errors. Now moved splitting functionality into separate function and added extensive unit tests. Thanks @schmitca for reporting #184.
task_iterator
now returns a unique list of tasks. The task graph is a DAG which is traversed through recursion intask_iterator
like a tree. If multiple tasks had the same task as a requirement (i.e. multiple nodes share a child), it was returned multiple times in the task iterator. This results in performance improvements when checking the requirements. #186. Thanks @MarcelHoh for the initial PR.
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.8.2...v0.9.0
- gbasf2: Fix gbasf2 glob queries (e.g. for downloading) for basf2 output files with multiple extensions, e.g.
<file>.udst.root
<file>.mdst.root
. #181. Thanks @schmitca for reporting, reviewing and testing.
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.8.1...v0.8.2
- gbasf2: Fix
ioctl
error ingb2_proxy_init
by reading in password viab2luigi
and then supplying password to that command directly, instead of lettinggb2_proxy_init
handle the password prompt. #172 @bilokin
- gbasf2: Add
gbasf2_proxy_group
andgbasf2_project_lpn_path
parameters to switch between gbasf2 groups. #175 @bilokin - add automatic "needs changelog" PR labeller as github workflow #166
- Update
pre-commit
hooks. Most notably for the developers, update theflake8
syntax and style-checker to version 5.0.4, which might change slightly what style is accepted. This should also fix an issue with the old flake8 version not being compatible with the latest version ofimportlib_meta
, which the pre-commit flake8 hook in the github actions to fail. In the process also migrated the pre-commit config format to the new layout.
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.7.6...v0.8.1
- htcondor: Make
HTCondorProcess.get_job_status
a method again instead. It was turned into a property accidentally in #158. See issue #164 @eckerpatrick and PR @165 @mschnepf.
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.7.5...v0.7.6
- htcondor: Do up to 3 retries for getting job status with
condor_q
#155 - gbasf2: Add caching and unit tests to
get_dirac_user
#156 - Add @mschnepf to the contributors for #158
- Some minor documentation improvements #151 and typo fix in help message #153.
- gbasf2: Adapt to new file name for gbasf2 setup file (
setup
→setup.sh
) #160 - gbasf2: Ensure proxy is initalized before running
get_proxy_info
to get dirac user #156 - htcondor: Don't fail when htcondor job status is
suspended
ortransferring_output
#158. Thanks to @mschnepf 🙇.
- gbasf2: Use
retry2
package for retrying getting of gbasf2 project status instead of my own recursive loop #161
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.7.4...v0.7.5
- Add a
CHANGELOG.md
file in addition to the release notes on github
- gbasf2: Fix moving of downloaded datasets with multiple datablocks (subs) #150
- gbasf2: If an error happens during proxy initialization, there was an error raised, but the
stderr
argument was wrong, which was fixed in #149
- gbasf2:
get_unique_lfns
in some cases returned a set and in some cases a list. Changed it to always return sets.
Small patch release for the gbasf2 process adding tests and better error checks for subprocess to make future debugging of problems like e.g. #138 easier
- Check output of
gb2_proxy_init
for errors by @meliache in nils-braun#142- if
gb2_proxy_init
fails due to a wrong certificate password, re-run the command until the user enters a correct password - raises a
CalledProcessError
when there is any other error string in the stdout ofgb2_proxy_init
. Since that script doesn't exit with errorcodes in case of errors, otherwise errors could go unnoticed and resulted errors in later commands, such as when usinggb2_proxy_info
. Tracking down which command originally failed might be some work, so this should make debugging much easiert.
- if
- Don't subpress
CalledProcessError
inget_proxy_info
- Add unit tests for
setup_dirac_proxy
andget_proxy_info
by mocking possible outputs ofgb2_proxy_info
andgb2_proxy_init
.
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.7.2...v0.7.3
- Test
ignore_additional_command_line_args
option by @meliache in nils-braun#128 - Add Moritz Baur and Artur Gottman to contributors list in documentation by @meliache in nils-braun#133, nils-braun#137
- Fix gb2_proxy_init error due to wrong HOME from gbasf2 setup script by @meliache in nils-braun#141
- Remove unused
Gbasf2Process
helper method to capture failed files from stdout by @meliache in nils-braun#136
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.7.1...v0.7.2
- Added
inherits_without
decorator to enable inheritance of everything except chosen parameters from task by @sognetic in nils-braun#106 - tasks can return all input or all outputfiles with a single call by @anselm-baur in nils-braun#111
- Allow for gbasf2 projects with multiple output
sub<xy>
directories by @meliache in nils-braun#122
- Bugfix: Don't use deprecated exception messsage property by @meliache in nils-braun#120
- Fix gbasf2 batch example for new basf2 releases: import ROOT by @meliache in nils-braun#123
- Ignore flake8 error for unused ROOT import by @meliache in nils-braun#124
- Give instances of
MasterTask
in doc examples less sensitive names by @meliache in nils-braun#118 - Replace parsing of gb2 command output with DIRAC API calls by @philiptgrace in nils-braun#121
- Use latest gbasf2 release on cvmfs as default install directory by @meliache in nils-braun#126
- Improve dirac proxy validity time handling by @philiptgrace in nils-braun#127
- For gbasf2 download with
gb2_ds_get
, use new--failed_lfns
option to get file with LFNs for which download failed instead of parsing stdout. by @ArturAkh in nils-braun#132
- @sognetic made their first contribution in nils-braun#106
Full Changelog: https://github.com/nils-braun/b2luigi/compare/v0.6.7...v0.7.1
- Improved gbasf2 download from grid. In particular, when re-trying a download, only re-download files which have previously failed. Store failed files in a
failed_files.txt
. The downside is that this relies on command output parsing which might break between releases. If errors occur, this can be worked around by removing thefailed_files.txt
, triggering a full re-download.
- Set progress bar in central scheduler for tasks executed as a gbasf2 project showing what percentage of jobs in the project is done and display the total numbers in the status.
- Fix gbasf2 download retry issues for new gbasf2 releases
- Fix in gbasf2 batch for memory error caused by changes in the ROOT from basf2 externals v10 (affects latest light releases).
#97 Gbasf2 Bugfix: Fix download for failed files
#95 Fix to show correct version number for b2luigi.__version__
Minor patch release but I decided to release this early so that users can use the version number to validate that they are using the latest release.
This release features small quality-of-life improvements and fixes for the gbasf2 batch, so I decide to make it a minor release.
Since we're still major release 0, instead of SemVer I think I will be creating minor releases for significant changes to luigi themselves and where all users should read the release notes and patch release for small patches that come out shortly after a release or when I do small non-api-breaking changes to individual batches only, which only affect users of that batch and don't really change b2luigi itself.
- #75 allow defining grid input LFNs via text files with
gbasf2_input_dslist
setting, analogous togbasf2 --input_dslist
- #91: Several improvements of gbasf2 handling (thanks to @ArturAkh)
- possibility to add input datafiles with
gbasf2_input_datafiles
option, which will be downloaded from SE's in addition. This is useful in case the sandbox files exceed 10 MB. - improved rescheduling: instead of performing it for each single failed job separately, perform it at once. Keeping track of n_retries is still maintained in the implementation of this pull request.
- improved downloading of datasets: in case of failed downloads only the ones which are failed, are downloaded, based on a collection of LFNs from created from
gb2_ds_get
stdout.
- possibility to add input datafiles with
- more unit tests for more stability in the future and getting a handle on growing complexity.
- #91: Several improvements of gbasf2 handling
- fix of RuntimeError ---> RuntimeWorking conversion: first argument of
warnings.warn
should be a string. Otherwise, getting a uncatched TypeError, followed by a PipeError of luigi. - added an improved handling of the
JobStatus
forDone
jobs, since in some (rare) cases,JobStatus
is set toDone
, whileApplicationStatus
is notDone
(in particular, has an Upload error for output file).
- fix of RuntimeError ---> RuntimeWorking conversion: first argument of
- #92 Fix CI shield on github
- #88 Bugfix gbasf2 dataset download where failed download raises runtime error instead of intended warning, thanks to @philiptgrace for finding and fixing this.
Upps, release v0.6.0 was mistakenly missing two PR's, #79 and #81, since I added the tag at the HEAD of the last branch that I merged and that branch didn't contain those PR's yet. into that release (#79), but that branch wasn't rebased to the head of main and didn't contain the LSF bugfix PR #81 and the gbasf2 feature PR #77 for supporting global tags. So this patch release includes those PRs and also it includes a fix to our PyPi publishing workflow (#82).
- #81: Bugfix in LSF batch code for getting settings
- #82: Fix missing depency in github workflow for automatic publishing to PyPi
- #77: the global tags have been also added to the sub-set of the basf2-state that is pickled and send to the grid. Remember, the gbasf2 batch wrapper just pickle the basf2 path and sends this to grid, so everything that is saved in the basf2 state is not transferred. In the previous release we already added pickling the basf2 variable aliases separately, now the global tags have also been added. If you have ideas how to handle this more generally, feel free to contribute via issue #35
-
use github actions / workflows for CI and PyPi deployment #78
Code-coverage tests automated and enforced with
codecov
to encourage writing unittests. This already resulted in some new unittests for thehtcondor
batch :) -
New optional
job_name
setting for assigning human-readable names for groups of jobs in LSF and HTCondor batches. This is useful when checking job statuses by hand. See documentation for more. #76, #79 -
#55 Optional to only pass known command line arguments, usueful in scripting if you want to pass additional command line args that should be forwarded to the script instead of being used by b2luigi
-
#70 Users can now add a
dry_run
method to their tasks which will be called during dry-run, e.g. if the b2luigi steering file is executed withpython3 <b2luigi_file_name>.py --dry-run
-
Adapt download of job outputs to new gbasf2 v5 output directory structure by adding
/sub00
to LFN's #57. Caveats are:S-
In future releases gbasf2 will split the outputs of large projects into multiple
sub<xy>
directories, but this isn't done as of now. These other subdirectories are not supported yet, but I created issue #80 as a reminder -
The output of
mdst
/udst
files is moved into subdirectories deeper in the hierarchie. We don't support that yet either. I have to think about whether I can figure out in a smart way what the output is or if the user should provide some additional info. Best would be to do it in parallel to what gbasf2 does. See issue #58 for more, help is welcome.
-
-
More stable downloads with
gb2_ds_get
When I started developing the gbasf2 wrapper, I expected that the failing of downloads will be a rare exception, but I realized that it is the norm and adapted the code to handle that more gracefully.
-
#72 if one job download fails, this doesn't raise a full exception anymore, so all the other tasks continue to run/download their outputs. The only thing that happens is that this particular task is marked a
failed
-
downloaded datasets persist after failure #67: If a download fails, the partially downloaded dataset remains in a directory with the
.partial
ending next to the expected output directory. On the one hand this ensures that b2luigi doesn't prematurely mark a task as completed until all job outputs in a gbasf2 project downloaded completely. The.partial
directory is only renamed to the final output directory, which b2luigi uses as a completeness target, once all jobs have been downloaded. On the other hand, keeping the partial downloads means that the download doesn't have to start from scratch everytime that you re-run a failed task. So, if a gbasf2 task failed downloading, you can just re-run the task and it will re-run the download of the missing outputs in your.partial directory
-
#62: Option to disable automatic log download from gird via
gbasf2_download_logs
setting. Logs are useful for debugging and reproducibility and I think they should always be stored in addition to the data itself. However, for gbasf2 it can take quite a while to download logs, so sometimes if in a hurry it can be useful disabling them and just looking them up online with the dirac web app if you need them.
-
- New luigi release 3 as dependency. This drops python2 support in luigi, which we didn't have anyway in b2luigi, so there should be no backwards incompatibility issues. On the plus side, this solves a dependency conflict with jupyter due to different required
tornado
versions - Allow dashes and underscores (
_
,-
ingbasf2
) project names #45 - allow variable aliases #40
- fix issue with
core.utils.get_filename()
in jupyter #34 - fix code in some basf2 examples to work with newer basf2 releases #37, #38, #37
- fix logic bug in setting
gbasf2_additional_params
#43 - modified time parsing that recognizes dirac proxy validity times > 24h #46
- workaround gbasf2 wildcard bug #41
- for dirac proxy handling, replace gbasf2 command string-parsing with direct communication with DIRAC Api via sub-script #51. Intended as an feature, but I think this also fixed a bug with a newer gbasf2 release
- default branch is now
main
#39
- deprecate some settings (#22)
- corrected path to decfile for new structure in basf2 release-04 (#23)
- Adding option to provide userdefined location of the task executable. This can be used analog to the optional task attribute . (#25)
- Soft wrapper for gbasf2 as a b2luigi BatchProcess (#32)
- Warning if forward slash in parameter (#27)
- change link to documentaion from latest to stable (#29)
- additional requirements structure (#30)
Features in this release:
- small bugfixes with envs and basf2 tasks (@nils-braun)
Features in this release:
- Added documentation
- Re-add an old feature for log files, will soon be deprecated.
Features in this release:
- Fixed a problem with basf2 module importing (@nils-braun)
- Better handling for filesystems (#21) (@nils-braun) Started supporting file copy mechanisms in htcondor, do only create folders when needed, better relative path handling.
Features in this release:
- Added relevant authors in docu (Nils Braun)
- Fixed travis config (Nils Braun)
Features in this release:
- Fixed required versions of packages (#15) (@nils-braun)
Features in this release:
-
Batch Improvements (#20) (@nils-braun): Generalize and simplify the batch setup and the dispatch method. Updated and added a lot of documentation. Please see the docu or the examples to check out the new ways to setup the batch environment.
-
HTCondor support (#19) (@welschma): Added long-needed support for HTCondor batch systems. Building block for #20.
-
Added Community Documents (@nils-braun)
-
Fix serialized parameters for basf2 tasks (#18) (@elimik31): Fixed problems after refactoring in basf2 tasks
-
Fix for get_basf2_git_hash to work with new basf2 tools (#17) (@elimik31) Check for the correct release name or head