Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determinism of outputs #114

Open
1 of 2 tasks
joeflack4 opened this issue Nov 13, 2024 · 0 comments
Open
1 of 2 tasks

Determinism of outputs #114

joeflack4 opened this issue Nov 13, 2024 · 0 comments

Comments

@joeflack4
Copy link
Collaborator

joeflack4 commented Nov 13, 2024

Overview

Why this is helpful
I'm developing the test suite. As a prerequisite, I need to ensure deterministic outputs given a set of inputs, mainly when using the outputs of --fast-run as test inputs.

However, it appears that the difference is likely only the order of the lines in the output, not the actual content. Therefore we can still use these outputs to run tests. The only advantage then that I can think of for ensuring that the order is the same is that we can quickly sanity check the outputs and say without confidence that there is no content difference. Let's say that a bug sneaks in in the future and, when running --fast-run, there does become a difference in the content of these files, perhaps because it samples a different set of inputs. This could cause tests to erroneously fail.

Background
I was originally going to do that by creating small, static input files that are a subset of the actual release files:

However, in the interest of time, I decided to check if --fast-run outputs were deterministic. If so, I don't need to do #96.

Current problem: snomed-parts.owl

They mostly are!
I ran a build multiple times in a row, generating these same outputs in different folders:

  • loinc-part-hierarchy-all.owl
  • loinc-part-list-all.owl
  • loinc-snomed-equiv.owl
  • loinc-term-primary-def.owl
  • loinc-term-supplementary-def.owl
  • loinc-terms-list-all.owl
  • snomed-parts.owl

I then compared them via md5. It appears that the only one that ever differs is snomed-parts.owl.

Analysis
There are two ways in which the output might be different between runs:
a. Content: Difference in class declarations, assertions, etc.
b. Order: Same exact content, but it appears in a different order.

From what I can ascertain so far, I believe 'b' is the case.

Sub-tasks

  • @joeflack4 Ensure that the difference observed is simply one of order and not
    • Probably did this adequately. Did a set difference on all lines in 2 different files, and there was no difference. So it appears the only difference is order. Note that I did this on the outputs from default --fast-run, which only had 427 lines.
  • Fix so that the order is always the same.

Additional notes

While the focus of this is determinism of outputs with --fast-run, it is also observed that snomed-parts.owl is different (different order of lines in output) when running without that flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 1. Backlog
Development

No branches or pull requests

2 participants