-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--fast-run
thoroughness
#110
Conversation
src/comp_loinc/module.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current performance
Claude parsed the log and here's how long it's taking to run now.
Option A: W/ --fast-run
on all methods
This is what I have set up currently. Not sure why it took me longer to run locally but, total time taken:
- Locally (table below): 11m
- GH action: 5.5m
Module | Duration (minutes) |
---|---|
loinc-part-hierarchy-all | 0.1 |
snomed-parts | 0.1 |
loinc-part-list-all | 0.4 |
loinc-snomed-equiv | 0.4 |
loinc-term-primary-def | 2.8 |
loinc-term-supplementary-def | 3.9 |
loinc-terms-list-all | 3.2 |
Option B: W/ --fast-run
on just 3 main methods
W/ constraints on just 3 methods
The only ones Shahim set originally to --fast-run
:
loinc_terms_all()
loinc_parts_all()
Joe added. This one was taking the most time:
- SNOMED Builder:
loinc_part_mapping_instances()
Module | Duration (minutes) |
---|---|
loinc-part-list-all | 0.4 |
loinc-snomed-equiv | 0.5 |
snomed-parts | 1.1 |
loinc-term-primary-def | 2.9 |
loinc-terms-list-all | 3.4 |
loinc-term-supplementary-def | 4.0 |
loinc-part-hierarchy-all | 7.4 |
Log
This log actually represents not the full time it would have taken as shown in the table above. The table above is an estimate.
While timing it, I I had temporarily modified save_to_owl()
to run faster:
ents = list(self.runtime.current_module.get_all_entities())
print('len entities (if over 5k, limit to that)', len(ents))
ents = ents[:5000]
I determined that owl_dumper.to_ontology_document()
is O(n), so the durations shown in the table above are based off of that.
2024-11-10 18:22:13 INFO Module Starting: loinc-part-list-all
Running load_linkml_schema
2024-11-10 18:22:36 INFO LoincBuilder save-owl: Starting
len entities (if over 5k, limit to that) 100
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-part-list-all.owl
2024-11-10 18:22:38 INFO LoincBuilder save-owl: Done
2024-11-10 18:22:38 INFO Module Starting: loinc-part-hierarchy-all
Running load_linkml_schema
2024-11-10 18:22:49 INFO LoincBuilder save-owl: Starting
len entities (if over 5k, limit to that) 116361
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-part-hierarchy-all.owl
2024-11-10 18:23:08 INFO LoincBuilder save-owl: Done
2024-11-10 18:23:09 INFO Module Starting: loinc-terms-list-all
2024-11-10 18:23:09 INFO LoincBuilder Starting lt-inst-all
2024-11-10 18:23:20 INFO LoincBuilder Finished 1000
2024-11-10 18:23:20 INFO LoincBuilder Finished 2000
2024-11-10 18:23:20 INFO LoincBuilder Finished 3000
2024-11-10 18:23:20 INFO LoincBuilder Finished 4000
2024-11-10 18:23:20 INFO LoincBuilder Finished 5000
2024-11-10 18:23:20 INFO LoincBuilder Finished lt-inst-all
2024-11-10 18:23:21 INFO LoincBuilder Starting lt-class-roots
2024-11-10 18:23:21 INFO LoincBuilder Finished 1000
2024-11-10 18:23:21 INFO LoincBuilder Finished 2000
2024-11-10 18:23:21 INFO LoincBuilder Finished 3000
2024-11-10 18:23:21 INFO LoincBuilder Finished 4000
2024-11-10 18:23:21 INFO LoincBuilder Finished lt-class-roots
Running load_linkml_schema
2024-11-10 18:23:21 INFO LoincBuilder save-owl: Starting
len entities (if over 5k, limit to that) 5104
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-terms-list-all.owl
2024-11-10 18:26:35 INFO LoincBuilder save-owl: Done
2024-11-10 18:26:35 INFO Module Starting: loinc-term-primary-def
2024-11-10 18:26:35 INFO LoincBuilder Starting lt-inst-all
2024-11-10 18:26:35 INFO LoincBuilder Finished 1000
2024-11-10 18:26:35 INFO LoincBuilder Finished 2000
2024-11-10 18:26:35 INFO LoincBuilder Finished 3000
2024-11-10 18:26:35 INFO LoincBuilder Finished 4000
2024-11-10 18:26:35 INFO LoincBuilder Finished 5000
2024-11-10 18:26:35 INFO LoincBuilder Finished lt-inst-all
Running load_linkml_schema
2024-11-10 18:27:11 INFO LoincBuilder save-owl: Starting
len entities (if over 5k, limit to that) 4945
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-term-primary-def.owl
2024-11-10 18:29:26 INFO LoincBuilder save-owl: Done
2024-11-10 18:29:26 INFO Module Starting: loinc-term-supplementary-def
2024-11-10 18:29:26 INFO LoincBuilder Starting lt-inst-all
2024-11-10 18:29:26 INFO LoincBuilder Finished 1000
2024-11-10 18:29:26 INFO LoincBuilder Finished 2000
2024-11-10 18:29:26 INFO LoincBuilder Finished 3000
2024-11-10 18:29:27 INFO LoincBuilder Finished 4000
2024-11-10 18:29:27 INFO LoincBuilder Finished 5000
2024-11-10 18:29:27 INFO LoincBuilder Finished lt-inst-all
Running load_linkml_schema
2024-11-10 18:30:57 INFO LoincBuilder save-owl: Starting
len entities (if over 5k, limit to that) 4945
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-term-supplementary-def.owl
2024-11-10 18:33:27 INFO LoincBuilder save-owl: Done
2024-11-10 18:33:27 INFO Module Starting: loinc-snomed-equiv
Running load_linkml_schema
2024-11-10 18:33:34 INFO LoincBuilder save-owl: Starting
len entities (if over 5k, limit to that) 11976
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/loinc-snomed-equiv.owl
2024-11-10 18:33:47 INFO LoincBuilder save-owl: Done
2024-11-10 18:33:47 INFO Module Starting: snomed-parts
Running load_linkml_schema
2024-11-10 18:34:53 INFO LoincBuilder save-owl: Starting
len entities (if over 5k, limit to that) 15
Writing file: /Users/joeflack4/projects/comploinc/output/build-default/fast-run/snomed-parts.owl
2024-11-10 18:34:54 INFO LoincBuilder save-owl: Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's the most major performance gain?
snomed-parts
was taking 30 minutes. Now 0.1 min.
If that's all we care about, I could revert back to option (B), which has less logic overall.
src/comp_loinc/module.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further optimization?
@ShahimEssaid Even on --fast-run
, the save_to_owl()
step is still taking quite a while sometimes. As can be seen in this comment, the longest one is loinc-part-hierarchy-all
, and this is determined by sheer n
entities ~116k.
- If we want
--fast-run
to be faster, what's the best way to modify the code such that then
entities in insave_to_owl()
is a lot lower?
- Joe added
--fast-run
processing to all methods, and that did speed up things up somewhat. This included lowern n ofsave_to_owl()
for some build steps. However, for some build steps, adding this additional logic did really lowern
insave_to_owl
and not speed things up significantly at all. Can see results in (comment w/ tables(--fast-run
thoroughness #110 (comment))). - Perhaps we should focus on (a) additionally, or (b) only restricting the total entities during the
save_to_owl()
step, rather than the individual builder methods? - Maybe
--fast-run
is fast enough for our current liking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I checked the runtime complexity of save_to_owl()
, and it is O(n).
88d6b6c
to
afe9b22
Compare
43e54b5
to
7211192
Compare
7211192
to
d8f82ff
Compare
7850298
to
5858d3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many methods to add --fast-run
processing to?
The only ones Shahim set originally to --fast-run
:
loinc_terms_all()
loinc_parts_all()
Joe added. This one was taking the most time:
- SNOMED Builder:
loinc_part_mapping_instances()
Joe has now added to all methods, but wondering:
- 1. Is the extra performance that helpful?
- 2. Does adding this processing to too many methods introduce any kind of inconsistencies between the files?
- ...which may pose more problematic if they are combined?
- If so, does it matter?
Compare: The performance gains between (A) only 3 methods, and (B) all methods (comment).
- Update: Lower n from 10k to 5k in some cases - Update: Add --fast-run to SNOMED builder
- Delete: Some temporary modifications to lower n in save_to_owl()
5858d3b
to
b4c9c18
Compare
- Add: --fast-run logic to more methods. - Add: config.py: Initialized with constants to use for controlling --fast-run.
b4c9c18
to
d59ec80
Compare
- Update: Fixed tests so that assertions match the lower n observed when running with the newly much more thorough --fast-run.
…ests - Updated the GH action workflow that does build & tests: - download inputs - download robot.jar - Add: Docs - Bug fix: Tests: Corrected the robot.jar command
QC GH Action: Finish setting up
Tests using `--fast-run`
Test passes after merging other branches into this one. Merging this now. |
Changes