Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--fast-run thoroughness #110

Merged
merged 7 commits into from
Dec 9, 2024
Merged

--fast-run thoroughness #110

merged 7 commits into from
Dec 9, 2024

Conversation

joeflack4
Copy link
Collaborator

@joeflack4 joeflack4 commented Nov 11, 2024

Changes

  • Update: Lower n from 10k to 5k in some cases
  • Update: Add --fast-run to SNOMED builder

Copy link
Collaborator Author

@joeflack4 joeflack4 Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current performance

Claude parsed the log and here's how long it's taking to run now.

Option A: W/ --fast-run on all methods

This is what I have set up currently. Not sure why it took me longer to run locally but, total time taken:

Module Duration (minutes)
loinc-part-hierarchy-all 0.1
snomed-parts 0.1
loinc-part-list-all 0.4
loinc-snomed-equiv 0.4
loinc-term-primary-def 2.8
loinc-term-supplementary-def 3.9
loinc-terms-list-all 3.2

Option B: W/ --fast-run on just 3 main methods

W/ constraints on just 3 methods

The only ones Shahim set originally to --fast-run:

  • loinc_terms_all()
  • loinc_parts_all()

Joe added. This one was taking the most time:

  • SNOMED Builder: loinc_part_mapping_instances()
Module Duration (minutes)
loinc-part-list-all 0.4
loinc-snomed-equiv 0.5
snomed-parts 1.1
loinc-term-primary-def 2.9
loinc-terms-list-all 3.4
loinc-term-supplementary-def 4.0
loinc-part-hierarchy-all 7.4

Log

This log actually represents not the full time it would have taken as shown in the table above. The table above is an estimate.

While timing it, I I had temporarily modified save_to_owl() to run faster:

        ents = list(self.runtime.current_module.get_all_entities())
        print('len entities (if over 5k, limit to that)', len(ents))
        ents = ents[:5000]

I determined that owl_dumper.to_ontology_document() is O(n), so the durations shown in the table above are based off of that.

2024-11-10 18:22:13 INFO     Module          Starting: loinc-part-list-all
Running load_linkml_schema
2024-11-10 18:22:36 INFO     LoincBuilder    save-owl: Starting
len entities (if over 5k, limit to that) 100
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-part-list-all.owl
2024-11-10 18:22:38 INFO     LoincBuilder    save-owl: Done
2024-11-10 18:22:38 INFO     Module          Starting: loinc-part-hierarchy-all
Running load_linkml_schema
2024-11-10 18:22:49 INFO     LoincBuilder    save-owl: Starting
len entities (if over 5k, limit to that) 116361
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-part-hierarchy-all.owl
2024-11-10 18:23:08 INFO     LoincBuilder    save-owl: Done
2024-11-10 18:23:09 INFO     Module          Starting: loinc-terms-list-all
2024-11-10 18:23:09 INFO     LoincBuilder    Starting lt-inst-all
2024-11-10 18:23:20 INFO     LoincBuilder    Finished 1000
2024-11-10 18:23:20 INFO     LoincBuilder    Finished 2000
2024-11-10 18:23:20 INFO     LoincBuilder    Finished 3000
2024-11-10 18:23:20 INFO     LoincBuilder    Finished 4000
2024-11-10 18:23:20 INFO     LoincBuilder    Finished 5000
2024-11-10 18:23:20 INFO     LoincBuilder    Finished lt-inst-all
2024-11-10 18:23:21 INFO     LoincBuilder    Starting lt-class-roots
2024-11-10 18:23:21 INFO     LoincBuilder    Finished 1000
2024-11-10 18:23:21 INFO     LoincBuilder    Finished 2000
2024-11-10 18:23:21 INFO     LoincBuilder    Finished 3000
2024-11-10 18:23:21 INFO     LoincBuilder    Finished 4000
2024-11-10 18:23:21 INFO     LoincBuilder    Finished lt-class-roots
Running load_linkml_schema
2024-11-10 18:23:21 INFO     LoincBuilder    save-owl: Starting
len entities (if over 5k, limit to that) 5104
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-terms-list-all.owl
2024-11-10 18:26:35 INFO     LoincBuilder    save-owl: Done
2024-11-10 18:26:35 INFO     Module          Starting: loinc-term-primary-def
2024-11-10 18:26:35 INFO     LoincBuilder    Starting lt-inst-all
2024-11-10 18:26:35 INFO     LoincBuilder    Finished 1000
2024-11-10 18:26:35 INFO     LoincBuilder    Finished 2000
2024-11-10 18:26:35 INFO     LoincBuilder    Finished 3000
2024-11-10 18:26:35 INFO     LoincBuilder    Finished 4000
2024-11-10 18:26:35 INFO     LoincBuilder    Finished 5000
2024-11-10 18:26:35 INFO     LoincBuilder    Finished lt-inst-all
Running load_linkml_schema
2024-11-10 18:27:11 INFO     LoincBuilder    save-owl: Starting
len entities (if over 5k, limit to that) 4945
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-term-primary-def.owl
2024-11-10 18:29:26 INFO     LoincBuilder    save-owl: Done
2024-11-10 18:29:26 INFO     Module          Starting: loinc-term-supplementary-def
2024-11-10 18:29:26 INFO     LoincBuilder    Starting lt-inst-all
2024-11-10 18:29:26 INFO     LoincBuilder    Finished 1000
2024-11-10 18:29:26 INFO     LoincBuilder    Finished 2000
2024-11-10 18:29:26 INFO     LoincBuilder    Finished 3000
2024-11-10 18:29:27 INFO     LoincBuilder    Finished 4000
2024-11-10 18:29:27 INFO     LoincBuilder    Finished 5000
2024-11-10 18:29:27 INFO     LoincBuilder    Finished lt-inst-all
Running load_linkml_schema
2024-11-10 18:30:57 INFO     LoincBuilder    save-owl: Starting
len entities (if over 5k, limit to that) 4945
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-term-supplementary-def.owl
2024-11-10 18:33:27 INFO     LoincBuilder    save-owl: Done
2024-11-10 18:33:27 INFO     Module          Starting: loinc-snomed-equiv
Running load_linkml_schema
2024-11-10 18:33:34 INFO     LoincBuilder    save-owl: Starting
len entities (if over 5k, limit to that) 11976
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-snomed-equiv.owl
2024-11-10 18:33:47 INFO     LoincBuilder    save-owl: Done
2024-11-10 18:33:47 INFO     Module          Starting: snomed-parts
Running load_linkml_schema
2024-11-10 18:34:53 INFO     LoincBuilder    save-owl: Starting
len entities (if over 5k, limit to that) 15
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/snomed-parts.owl
2024-11-10 18:34:54 INFO     LoincBuilder    save-owl: Done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's the most major performance gain?

snomed-parts was taking 30 minutes. Now 0.1 min.

If that's all we care about, I could revert back to option (B), which has less logic overall.

Copy link
Collaborator Author

@joeflack4 joeflack4 Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further optimization?

@ShahimEssaid Even on --fast-run, the save_to_owl() step is still taking quite a while sometimes. As can be seen in this comment, the longest one is loinc-part-hierarchy-all, and this is determined by sheer n entities ~116k.

  1. If we want --fast-run to be faster, what's the best way to modify the code such that the n entities in in save_to_owl() is a lot lower?
  • Joe added --fast-run processing to all methods, and that did speed up things up somewhat. This included lowern n of save_to_owl() for some build steps. However, for some build steps, adding this additional logic did really lower n in save_to_owl and not speed things up significantly at all. Can see results in (comment w/ tables(--fast-run thoroughness #110 (comment))).
  • Perhaps we should focus on (a) additionally, or (b) only restricting the total entities during the save_to_owl() step, rather than the individual builder methods?
  • Maybe --fast-run is fast enough for our current liking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I checked the runtime complexity of save_to_owl(), and it is O(n).

@joeflack4 joeflack4 force-pushed the fast-run-updates branch 3 times, most recently from 88d6b6c to afe9b22 Compare November 11, 2024 01:55
@joeflack4 joeflack4 force-pushed the fast-run-updates branch 2 times, most recently from 43e54b5 to 7211192 Compare November 11, 2024 02:09
@joeflack4 joeflack4 force-pushed the fast-run-updates branch 2 times, most recently from 7850298 to 5858d3b Compare November 11, 2024 02:27
Copy link
Collaborator Author

@joeflack4 joeflack4 Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many methods to add --fast-run processing to?

The only ones Shahim set originally to --fast-run:

  • loinc_terms_all()
  • loinc_parts_all()

Joe added. This one was taking the most time:

  • SNOMED Builder: loinc_part_mapping_instances()

Joe has now added to all methods, but wondering:

  • 1. Is the extra performance that helpful?
  • 2. Does adding this processing to too many methods introduce any kind of inconsistencies between the files?
    • ...which may pose more problematic if they are combined?
    • If so, does it matter?

Compare: The performance gains between (A) only 3 methods, and (B) all methods (comment).

- Update: Lower n from 10k to 5k in some cases
- Update: Add --fast-run to SNOMED builder
- Delete: Some temporary modifications to lower n in save_to_owl()
@joeflack4 joeflack4 changed the title --fast-run updates --fast-run thoroughness Nov 11, 2024
- Add: --fast-run logic to more methods.
- Add: config.py: Initialized with constants to use for controlling --fast-run.
- Update: Fixed tests so that assertions match the lower n observed when running with the newly much more thorough --fast-run.
…ests

- Updated the GH action workflow that does build & tests:
  - download inputs
  - download robot.jar
- Add: Docs
- Bug fix: Tests: Corrected the robot.jar command
@joeflack4
Copy link
Collaborator Author

Test passes after merging other branches into this one. Merging this now.

@joeflack4 joeflack4 merged commit cdc81cf into main Dec 9, 2024
1 check passed
@joeflack4 joeflack4 deleted the fast-run-updates branch December 9, 2024 23:30
This was referenced Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 4. Done
Development

Successfully merging this pull request may close these issues.

1 participant