Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant data members from InputFileCatalog to reduce memory use #47013

Merged
merged 4 commits into from
Jan 9, 2025

Conversation

makortel
Copy link
Contributor

@makortel makortel commented Dec 19, 2024

PR description:

In #46975 (comment) I discovered the InputFileCatalog took ~488 MB memory per stream in a production DIGI job overlaying premixed pileup, where the job was configured with nearly 500k pileup files. A quick look in InputFileCatalog showed the input file names are more or less stored three times

  • in fileNames_ to communicate a copy of the input file names from constructor to init()
    • I think this copy is completely unnecessary, and is removed in the second commit
  • in logicalFileNames_ to partly communicate input file names from constructor to init(), and partly to allow cheap logicalFileNames() getter to them
    • I noticed the logicalFileNames() function is not really used, so in order to avoid storing the file names in member data, I decided to remove the member function and the logicalFileNames_ member in the third commit
  • in FileCatalogItem::lfn_ stored in fileCatalogItems_ member
    • This copy of the lfn_ is being used

An alternative to the second commit could be to keep the logicalFileNames_, and store the FileCatalogItem::lfn_ as string_view, but I felt that to be a tiny bit more complex.

The fourth commit avoids one copy of the std::vector<std::string> of the file names when the InputFileCatalog is constructed from a temporary vector, which is the case with all Sources that use InputFileCatalog.

The first commit adds a unit test for InputFileCatalog

Resolves cms-sw/framework-team#1113

PR validation:

Unit tests passed in CMSSW_14_0_18. With example job #46975 (comment) MaxMemoryPreload showed 197 MB reduction in peak allocated memory on 1 thread.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

To be backported to 14_1_X and 14_0_X.

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 19, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel for master.

It involves the following packages:

  • FWCore/Catalog (core)
  • FWCore/Sources (core)

@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks.
@missirol, @wddgit this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@makortel
Copy link
Contributor Author

@Dr15Jones please test

@makortel
Copy link
Contributor Author

Unit tests passed in CMSSW_14_0_18.

It seems to me we don't have good tests for InputFileCatalog :(

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 24KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-257155/43567/summary.html
COMMIT: ae9d552
CMSSW: CMSSW_15_0_X_2024-12-19-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47013/43567/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 46
  • DQMHistoTests: Total histograms compared: 3510017
  • DQMHistoTests: Total failures: 389
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3509608
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 45 files compared)
  • Checked 202 log files, 172 edm output root files, 46 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

Unit tests passed in CMSSW_14_0_18.

It seems to me we don't have good tests for InputFileCatalog :(

Added a unit test for InputFileCatalog

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #47013 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

@makortel
Copy link
Contributor Author

makortel commented Jan 8, 2025

Ok, force-pushing again with a modified commit id helped

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 8, 2025

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 8, 2025

Pull request #47013 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

@makortel
Copy link
Contributor Author

makortel commented Jan 8, 2025

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 8, 2025

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-257155/43677/summary.html
COMMIT: 5744eaa
CMSSW: CMSSW_15_0_X_2025-01-08-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47013/43677/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

@makortel
Copy link
Contributor Author

makortel commented Jan 8, 2025

Comparison failures are related to #46416

@makortel
Copy link
Contributor Author

makortel commented Jan 8, 2025

+core

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 8, 2025

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @rappoccio, @sextonkennedy, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit c504609 into cms-sw:master Jan 9, 2025
11 checks passed
@makortel makortel deleted the reduceInputFileCatalogMemory branch January 9, 2025 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove redundant data members from InputFileCatalog to reduce memory use
4 participants