Skip to content

v1.7.0

Compare
Choose a tag to compare
@jpata jpata released this 08 Apr 06:09
· 41 commits to main since this release
4caf602

What's Changed

The primary feature of the new release is that pytorch is now the main mode of training.
The CMS status was presented at https://indico.cern.ch/event/1399688/#1-ml-for-pf.

  • switch pytorch training to tfds array-record datasets by @farakiko in #228
  • Timing the ONNX model, retrain CMS-GNNLSH-TF by @jpata in #229
  • fixes for pytorch, CMS t1tttt dataset, update response plots by @jpata in #232
  • fix pytorch multi-GPU training hang by @farakiko in #233
  • feat: specify number of samples as cmd line arg in pytorch training and testing by @erwulff in #237
  • Automatically name training dir in pytorch pipeline by @erwulff in #238
  • pytorch backend major update by @farakiko in #240
  • Update dist.barrier() and fix stale epochs for torch backend by @farakiko in #249
  • multi-bin loss in TF, plot fixes by @jpata in #234
  • PyTorch distributed num-workers>0 fix by @farakiko in #252
  • speedup of the pytorch GNN-LSH model by @jpata in #245
  • Implement HPO for PyTorch pipeline. by @erwulff in #246
  • fix tensorboard error by @farakiko in #254
  • fix config files by @erwulff in #255
  • making the 3d-padded models more efficient in pytorch by @jpata in #256
  • Fix pytorch inference after #256 by @jpata in #257
  • Update training.py by @jpata in #261
  • Reduce the number of data loader workers per dataset in pytorch by @farakiko in #262
  • fix inference by @farakiko in #264
  • Implementing configurable checkpointing. by @erwulff in #263
  • restore onnx export in pytorch by @jpata in #265
  • remove outdated forward_batch from pytorch by @jpata in #266
  • Separate multiparticlegun samples from singleparticle gun samples by @farakiko in #267
  • compare all three models in pytorch by @jpata in #268
  • Allows testing on a given --load-checkpoint by @farakiko in #269
  • added clic evaluation notebook by @jpata in #272
  • Fix --load-checkpoint bug by @farakiko in #270
  • Implement CometML logging to PyTorch training pipeline. by @erwulff in #273
  • Add command line argument to choose experiments dir in PyTorch training pipeline by @erwulff in #274
  • Implement multi-gpu training in HPO with Ray Tune and Ray Train by @erwulff in #277
  • Better CometML logging + Ray Train vs DDP comparison by @erwulff in #278
  • Fix checkpoint loading by @erwulff in #280
  • Learning rate schedules and Mamba layer by @erwulff in #282
  • use modern optimizer, revert multi-bin loss in TF by @jpata in #253
  • track individual particle loss components, speedup inference by @jpata in #284
  • Update the jet pt threshold to be the same as the PF paper by @farakiko in #283
  • towards v1.7: new CMS datasets, CLIC hit-based datasets, TF backward-compat optimizations by @jpata in #285
  • fix torch no grad by @jpata in #290
  • pytorch regression output layer configurability by @jpata in #291
  • Implement resume-from-checkpoint in HPO by @erwulff in #293
  • enable FlashAttention in pytorch, update to torch 2.2.0 by @jpata in #292
  • fix pad_power_of_two by @jpata in #296
  • Feat val freq by @erwulff in #298
  • normalize loss, reparametrize network by @jpata in #297
  • fix up configs by @jpata in #300
  • clean up loading by @jpata in #301
  • Fix unpacking for 3d padded batch, update plot style by @jpata in #306

Full Changelog: v1.6...v1.7.0