Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LGVISIUM-102: common parent class "Sidebar" for LayerIdentifierColumn and DepthColumn #105

Merged
merged 21 commits into from
Nov 22, 2024

Conversation

stijnvermeeren-swisstopo
Copy link
Contributor

@stijnvermeeren-swisstopo stijnvermeeren-swisstopo commented Nov 11, 2024

A lot of lines changes, but a lot of it is simple renaming, moving code to different files and documentation.

Main renamed classes:

  • BoundaryDepthColumn -> AAboveBSidebar
  • LayerDepthColumn -> AToBSidebar
  • LayerIdentifierColumn -> LayerIdentifierSidebar

These three classes now extend from a common parent class (Sidebar).
Logic for finding these different "sidebars" in borehole profiles is contained in respective SidebarExtractor classes.
Code for all of these is now more cleanly contained in a single sidebar package.

The bounding boxes for depth columns and material descriptions, that are part of the predictions.json output, are now more elegantly represented by a new class BoundaryBoxes, removing the need to implement JSON serialization logic from the actual Sidebar (formerly DepthColumn) classes.

There is a very small / almost negligible effect on the benchmarking scores, caused by the fact that we no longer automatically reject material descriptions to which we cannot associate any depth interval.

This PR also updates the documentation for the predictions.json format, moving it to a separate markdown file, and removing a few redundant entries from the JSON output

Copy link

github-actions bot commented Nov 11, 2024

Coverage

Coverage Report
FileStmtsMissCoverMissing
src/stratigraphy
   __init__.py8188%11
   extract.py1671670%3–446
   get_files.py19190%3–47
   main.py1141140%3–307
src/stratigraphy/benchmark
   metrics.py594229%22–25, 29–32, 36–39, 46–49, 53–54, 58, 65–74, 78–91, 96–133
src/stratigraphy/data_extractor
   data_extractor.py74495%33, 46, 123, 168
src/stratigraphy/depthcolumn
   depthcolumnentry.py681282%21, 25, 37, 71–72, 86, 95, 107–109, 130, 143
src/stratigraphy/depths_materials_column_pairs
   bounding_boxes.py301067%23, 32, 50, 60, 72–78
   material_description_rect_with_sidebar.py18856%27–41
src/stratigraphy/evaluation
   evaluation_dataclasses.py491178%52, 71–74, 90, 104, 125–131, 147
   groundwater_evaluator.py48198%77
   layer_evaluator.py664630%29–30, 35–39, 47, 69–95, 105–113, 128–149
   metadata_evaluator.py371462%46–65, 86–93
   utility.py16756%43–52
src/stratigraphy/groundwater
   groundwater_extraction.py1569937%52, 94, 127–132, 140, 167–171, 186–206, 217–306, 322–354
   utility.py393315%10–17, 30–47, 59–73, 88–102
src/stratigraphy/layer
   layer.py371365%26, 29, 37, 52–72
src/stratigraphy/lines
   geometric_line_utilities.py86298%81, 131
   line.py51492%25, 50, 60, 110
   linesquadtree.py46198%75
src/stratigraphy/metadata
   coordinate_extraction.py106496%29, 93–94, 106
   elevation_extraction.py906033%34–39, 47, 55, 63, 79–87, 124–138, 150–153, 165–197, 212–220, 228–232
   language_detection.py181328%17–23, 37–45
   metadata.py662464%27, 83, 101–127, 146–155, 195–198, 206
src/stratigraphy/sidebar
   a_above_b_sidebar.py954058%39, 45, 64–72, 83, 88, 95, 108, 113–120, 135–136, 178–219
   a_above_b_sidebar_extractor.py29390%45–47
   a_above_b_sidebar_validator.py412051%48, 58, 61, 81–84, 109–127, 139–148
   a_to_b_sidebar.py471666%37, 40, 53–54, 71, 99–114
   layer_identifier_sidebar.py513237%23–24, 27, 59–78, 94–110, 122, 135
   layer_identifier_sidebar_extractor.py423321%30–40, 54–86
   sidebar.py40198%84
src/stratigraphy/text
   description_block_splitter.py70297%24, 139
   extract_text.py29390%19, 53–54
   find_description.py41880%26–34, 111–114
   textblock.py901188%22, 27, 39, 44, 71, 79, 104, 116, 139, 160, 189
src/stratigraphy/util
   dataclasses.py32391%37–39
   interval.py1146543%28–31, 36–39, 45, 51, 55, 94–140, 157, 163–179, 196–214
   predictions.py723453%72, 95–115, 143–187
   util.py341362%41, 69–76, 90–92, 116–117
TOTAL237799358% 

Tests Skipped Failures Errors Time
99 0 💤 0 ❌ 0 🔥 7.891s ⏱️

Base automatically changed from LGVISIUM-79/refactor-layers-object to main November 12, 2024 15:18
@stijnvermeeren-swisstopo stijnvermeeren-swisstopo changed the title LGVISIUM-102: LayerIdentifierColumn extends from DepthColumn LGVISIUM-102: common parent class "Sidebar" for LayerIdentifierColumn and DepthColumn Nov 19, 2024
@stijnvermeeren-swisstopo stijnvermeeren-swisstopo marked this pull request as ready for review November 19, 2024 13:39
Copy link
Contributor

@lhaibach lhaibach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the structure and naming -- it's much more intuitive for me now to understand which class does what.
Question regarding parent classes: AAboveBSidebar and AToBSidebar and LayerIdentifierSidebar are child classes of Sidebar, which makes perfect sense to me. For the corresponding extractors AAboveBSidebarExtractor and AtoBSidebarExtractor and LayerIdentifierSidebarExtractor this is not the case. How do you decide when to use parent classes for certain classes?
Regarding the #todo: automatic AAboveBInterval: will it be possible somehow to determine the interval type based on the sidebar type?

@stijnvermeeren-swisstopo
Copy link
Contributor Author

@lhaibach
Re SidebarExtractor: potentially it would make sense to have these extend from a common parent class. This might also be what the comment TODO: Similar to AToBSidebarExtractor.find_in_words(). Refactoring may be desired. in layer_identifier_sidebar_extractor.py is hinting at. However, I don't immediately see a significant amount of shared logic that we could move into a shared parent class. But feel free to give it a try, if you think there is potential to simplify the code like that.

Generally speaking, main reasons for using a parent class are:

  • Shared code between different classes
  • Allowing different classes to be passed as a value for the same parameter (with transparent type annotations)

Re the TODO in extract.py: I don't think the problem is determining the interval type (at this point in the code, we already have an interval instance with a specific type). I think the solution should be to allow any interval type (not just AAboveBInterval) in the Layer class, and to deal with any implications that this might have (in particular, concerning JSON serialization and deserialization).

@stijnvermeeren-swisstopo
Copy link
Contributor Author

I have now created a ticket for that last TODO: https://jira.swisstopo.ch/browse/LGVISIUM-104

@lhaibach
Copy link
Contributor

@stijnvermeeren-swisstopo thanks for explaining :) !

@stijnvermeeren-swisstopo stijnvermeeren-swisstopo merged commit b66b08e into main Nov 22, 2024
3 checks passed
@stijnvermeeren-swisstopo stijnvermeeren-swisstopo deleted the LGVISIUM-102/LayerIdentifierColumn branch November 22, 2024 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants