Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-reproducibility in TrackerPhase2OTL1Track #47071

Open
makortel opened this issue Jan 9, 2025 · 13 comments
Open

Non-reproducibility in TrackerPhase2OTL1Track #47071

makortel opened this issue Jan 9, 2025 · 13 comments

Comments

@makortel
Copy link
Contributor

makortel commented Jan 9, 2025

Tests of PRs unrelated to L1T show differences in workflows 29634.911 and 29834.999 in TrackerPhase2OTL1Track, TrackerPhase2OTL1TrackV, and L1T folders. In #47051 (comment)

  • 29634.911 had 66 differences
  • 29834.999 had 15 differences
@makortel
Copy link
Contributor Author

makortel commented Jan 9, 2025

assign l1, dqm, upgrade

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2025

New categories assigned: l1,dqm,upgrade

@aloeliger,@antoniovagnerini,@epalencia,@Moanwar,@rseidita,@srimanob,@subirsarkar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 9, 2025

A new Issue was created by @makortel.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Jan 21, 2025

@tomalin FYI

@skinnari
Copy link
Contributor

hi @makortel , i am confused why these are all showing as failures. if i look at the actual histograms, they all look fine (e.g. https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_15_0_X_2025-01-09-1100+9e6aa1/66377/29634.911_TTbar_14TeV+Run4D110_DD4hep/TrackerPhase2OTL1Track_Tracks_HQ.html). there is one entry difference between the two sets (246 vs 247), is that what is causing these all to be flagged as red?

@makortel
Copy link
Contributor Author

there is one entry difference between the two sets (246 vs 247), is that what is causing these all to be flagged as red?

Probably? (technical question would be for @cms-sw/pdmv-l2 whose histogram comparison infrastructure is being used in PR tests)

Looking at https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_15_0_X_2025-01-09-1100+9e6aa1/66377/29634.911_TTbar_14TeV+Run4D110_DD4hep/TrackerPhase2OTL1Track__Tracks_HQ_Track_HQ_NStubs.png

Image

There is "clear" difference between blue and red in the 4-5 bin (probably by 1).

@AdrianoDee
Copy link
Contributor

AdrianoDee commented Jan 23, 2025

The original sin there is that the comparison is performed via the BinToBin statistical tests (instead of the Chi2) and the default threshold is set to be 0.9999. It is basically checking if the bins are identical not taking into account any uncertainty. Then the rank is the fraction of perfectly matched bins. So in cases like this with very few bins also a single mismatch trigger the failure. I'm trying to find where the BinToBin method is selected.

Now the question would be: do we want to spot these discrepancies? Maybe this case is a bit pathological (and the test could, e.g., take into account the histogram population), but in general I think it would be interesting to be aware of this irreproducibilities given we run exactly on the same events.

@AdrianoDee
Copy link
Contributor

I'm trying to find where the BinToBin method is selected.

Ok, of course all the PR comparisons are BinToBin, while usually for the RelMon we use the Chi2. And actually the threshold is way higher: 0.999999999999.

@makortel
Copy link
Contributor Author

Now the question would be: do we want to spot these discrepancies? Maybe this case is a bit pathological (and the test could, e.g., take into account the histogram population), but in general I think it would be interesting to be aware of this irreproducibilities given we run exactly on the same events.

So far we have (in practice, at least) required CPU code to be fully reproducible within the same x86 microarchitecture and CPU vendor when running on 1 thread. In all cases so far the cause for non-reproducibility has been a bug somewhere.

@srimanob
Copy link
Contributor

Is this issue an extension of #45505 ?

@makortel
Copy link
Contributor Author

Is this issue an extension of #45505 ?

Based on the (little) information in #45505, I'd guess that issue would have a different cause than that is reported here.

@srimanob
Copy link
Contributor

OK, thanks. So, looking on the list of workflow, the issue is not only DD4hep (as in #35109), but also DB one (with DDD).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants