Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L1TMuonEndCapTrackProducer::produce() takes 96 MB memory per stream #42526

Open
makortel opened this issue Aug 9, 2023 · 16 comments
Open

L1TMuonEndCapTrackProducer::produce() takes 96 MB memory per stream #42526

makortel opened this issue Aug 9, 2023 · 16 comments

Comments

@makortel
Copy link
Contributor

makortel commented Aug 9, 2023

Live memory profiles of #40437 (comment) show that L1TMuonEndCapTrackProducer::produce() takes 100 MB memory / stream

The L1TMuonEndCapTrackProducer module itself is an edm::stream. The memory consumption can be split into

Assuming the PtAssignmentEngine::load() is indeed BDT or similar, does its representation really need to be that large? Ideally all of these would be in GlobalCache of the module (e.g. Tensorflow stuff), or in the EventSetup (e.g. the BDT whose content apparently depends on the L1TMuonEndCapParams and L1TMuonEndCapForest EventSetup data products).

@makortel
Copy link
Contributor Author

makortel commented Aug 9, 2023

assign l1

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 9, 2023

New categories assigned: l1

@epalencia,@aloeliger you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 9, 2023

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

makortel commented Aug 9, 2023

Moving the Tensorflow stuff to GlobalCache was also discussed in #32894

@eyigitba
Copy link
Contributor

eyigitba commented Aug 9, 2023

Hi @makortel , as you said the main contribution here is loading of the BDT in PtAssignmentEngine::load and coordinate conversion LUTs in SectorProcessorLUT::read. I don't know how we can reduce these easily.

Regarding the Tensorflow stuff and GlobalCache, I currently don't have the time to rework the code unfortunately. If you and/or L1T offline software think that this should be done, I can try to pass this task to someone within the EMTF group and we can see how quickly we can implement this.

@makortel
Copy link
Contributor Author

makortel commented Aug 9, 2023

Hi @eyigitba I think addressing the PtAssignmentEngine::load() (at least making read-only parts shared across streams; I hope the memory there is mostly read-only) would be important.

The Tensorflow part would be nice (e.g. if the model would become larger in the future, or serving as an example for others), but with 0.5 MB / stream not that important today.

@eyigitba
Copy link
Contributor

eyigitba commented Aug 9, 2023

Hi @makortel , ok we'll look into the BDT loading and also see how we can improve the Tensorflow part.

How urgent is this btw?

@VinInn
Copy link
Contributor

VinInn commented Aug 10, 2023

Memory budget is 2GB per stream (including shared component and I/O buffers).
L1TMuonEndCapTrackProducer alone accounts for 5% of that.
To L1 mngmt to judge how urgent it WAS.

@VinInn
Copy link
Contributor

VinInn commented Aug 10, 2023

BTW: why using your own implementation of a Forest and not the highly optimized common CMS one
https://cmssdt.cern.ch/dxr/CMSSW/source/CondFormats/GBRForest/interface/GBRForest.h#24
?

I would advice to switch to that.

@eyigitba
Copy link
Contributor

Thanks for the advice @VinInn . I don't know why it was implemented like this, but this code is quite old, from 2016 or so. I unfortunately don't have much time these couple of weeks, but we'll discuss in the EMTF group to come up with a solution soon.

@makortel
Copy link
Contributor Author

The PtAssignmentEngine::load() came up again in #46040 (comment), this time with ~47 MB / stream cost. On an 8-thread/stream PromptReco that corresponds to 376 MB. Avoiding the per-stream copies could potentially save ~329 MB, and moving to GBRForest would likely save more.

In my opinion it would be great if this would be addressed for 2025 data taking.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 17, 2024

cms-bot internal usage

@makortel
Copy link
Contributor Author

type performance-improvements

@makortel
Copy link
Contributor Author

makortel commented Dec 9, 2024

@eyigitba @cms-sw/l1-l2 Any news?

@eyigitba
Copy link
Contributor

@makortel , not yet from my side. I have a student looking into this but nothing to discuss for now

@makortel
Copy link
Contributor Author

Came up in #46975 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants