Determinism of outputs #114

joeflack4 · 2024-11-13T02:26:16Z

Overview

Why this is helpful
I'm developing the test suite. As a prerequisite, I need to ensure deterministic outputs given a set of inputs, mainly when using the outputs of --fast-run as test inputs.

However, it appears that the difference is likely only the order of the lines in the output, not the actual content. Therefore we can still use these outputs to run tests. The only advantage then that I can think of for ensuring that the order is the same is that we can quickly sanity check the outputs and say without confidence that there is no content difference. Let's say that a bug sneaks in in the future and, when running --fast-run, there does become a difference in the content of these files, perhaps because it samples a different set of inputs. This could cause tests to erroneously fail.

Background
I was originally going to do that by creating small, static input files that are a subset of the actual release files:

Tests: E2E: Static files & GH action #96

However, in the interest of time, I decided to check if --fast-run outputs were deterministic. If so, I don't need to do #96.

Current problem: `snomed-parts.owl`

They mostly are!
I ran a build multiple times in a row, generating these same outputs in different folders:

loinc-part-hierarchy-all.owl
loinc-part-list-all.owl
loinc-snomed-equiv.owl
loinc-term-primary-def.owl
loinc-term-supplementary-def.owl
loinc-terms-list-all.owl
snomed-parts.owl

I then compared them via md5. It appears that the only one that ever differs is snomed-parts.owl.

Analysis
There are two ways in which the output might be different between runs:
a. Content: Difference in class declarations, assertions, etc.
b. Order: Same exact content, but it appears in a different order.

From what I can ascertain so far, I believe 'b' is the case.

Sub-tasks

@joeflack4 Ensure that the difference observed is simply one of order and not
- Probably did this adequately. Did a set difference on all lines in 2 different files, and there was no difference. So it appears the only difference is order. Note that I did this on the outputs from default --fast-run, which only had 427 lines.
Fix so that the order is always the same.

Additional notes

While the focus of this is determinism of outputs with --fast-run, it is also observed that snomed-parts.owl is different (different order of lines in output) when running without that flag.

The text was updated successfully, but these errors were encountered:

joeflack4 assigned joeflack4 and ShahimEssaid Nov 13, 2024

joeflack4 added stability qc / test labels Nov 13, 2024

joeflack4 added this to CompLOINC Nov 13, 2024

github-project-automation bot moved this to 1. Backlog in CompLOINC Nov 13, 2024

This was referenced Nov 13, 2024

Tests: E2E: Static files & GH action #96

Closed

Tests: Create static inputs #92

Closed

Tests: Improve n of SNOMED outputs #116

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determinism of outputs #114

Determinism of outputs #114

joeflack4 commented Nov 13, 2024 •

edited

Loading

Determinism of outputs #114

Determinism of outputs #114

Comments

joeflack4 commented Nov 13, 2024 • edited Loading

Overview

Current problem: snomed-parts.owl

Sub-tasks

Additional notes

joeflack4 commented Nov 13, 2024 •

edited

Loading

Current problem: `snomed-parts.owl`