Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lgvisium 100/extract depth value error handling #107

Merged
merged 12 commits into from
Dec 10, 2024

Conversation

lhaibach
Copy link
Contributor

@lhaibach lhaibach commented Dec 4, 2024

Changes done:

  • In the extract_depth function, the ValueError handling has been incorporated, and the matched pattern are now more specific to groundwater depth. This change reduced the number of ValueErrors encountered during float conversion, which previously caused the process to exit the loop prematurely. However, this refinement has led to an increase in false positives.

  • To reduce these false positives, the logic for finding groundwater information has been changed. Groundwater search is now directly tied to the material description bounding box, meaning groundwater data is only search for in areas associated with material descriptions. This adjustment reduces unnecessary searches on pages without material descriptions and ensures groundwater extraction is more contextually related to material description areas. Since groundwater details can sometimes appear at the end of a page or farther from the material description, the search around the bounding box is not overly strict.

  • To further reduce false postitives the logic for finding lines near keys was changed (impacting coordinate, elevation, and depth extraction). Instead of searching in a rectangle ( including entries diagonal to the key), the updated code now searches along the x and y axes. This adjustment improves the F1 score for groundwater extraction while slightly worsening performance for elevation extraction (data/geoquat/validation/ A11486.pdf).

More in Ticket LGVISIUM-100

Copy link

github-actions bot commented Dec 4, 2024

Coverage

Coverage Report
FileStmtsMissCoverMissing
src/stratigraphy
   __init__.py8188%11
   extract.py1671670%3–446
   get_files.py19190%3–47
   main.py1201200%3–326
src/stratigraphy/benchmark
   metrics.py594229%22–25, 29–32, 36–39, 46–49, 53–54, 58, 65–74, 78–91, 96–133
src/stratigraphy/data_extractor
   data_extractor.py76495%32, 45, 120, 164
   utility.py6350%28–36
src/stratigraphy/depth
   a_to_b_interval_extractor.py371559%41–60, 79, 92
   depthcolumnentry.py15380%19, 23, 35
   depthcolumnentry_extractor.py23291%44–45
   interval.py1015249%26–29, 34–37, 43, 49, 53, 92–138, 159, 165–181
src/stratigraphy/depths_materials_column_pairs
   bounding_boxes.py301067%23, 32, 50, 60, 72–78
   material_description_rect_with_sidebar.py18856%27–41
src/stratigraphy/evaluation
   evaluation_dataclasses.py491178%52, 71–74, 90, 104, 125–131, 147
   groundwater_evaluator.py48198%77
   layer_evaluator.py664630%29–30, 35–39, 47, 69–95, 105–113, 128–149
   metadata_evaluator.py371462%46–65, 86–93
   utility.py16756%43–52
src/stratigraphy/groundwater
   groundwater_extraction.py1469038%52, 94, 137–148, 180–184, 199–215, 226–314, 335–363
   utility.py423614%10–17, 30–50, 62–76, 91–105
src/stratigraphy/layer
   layer.py361364%25, 28, 36, 51–71
src/stratigraphy/lines
   geometric_line_utilities.py86298%81, 131
   line.py51492%25, 50, 60, 110
   linesquadtree.py46198%75
src/stratigraphy/metadata
   coordinate_extraction.py106496%29, 93–94, 106
   elevation_extraction.py906033%34–39, 47, 55, 63, 79–87, 124–138, 150–153, 165–197, 212–220, 234–238
   language_detection.py181328%17–23, 37–45
   metadata.py662464%27, 83, 101–127, 146–155, 195–198, 206
src/stratigraphy/sidebar
   a_above_b_sidebar.py944057%38, 44, 63–71, 82, 87, 94, 107, 112–119, 134–135, 177–218
   a_above_b_sidebar_extractor.py29390%46–48
   a_above_b_sidebar_validator.py412051%48, 58, 61, 81–84, 109–127, 139–148
   a_to_b_sidebar.py431467%36, 49–50, 67, 95–108
   layer_identifier_sidebar.py513237%23–24, 27, 59–78, 94–110, 122, 135
   layer_identifier_sidebar_extractor.py413320%30–40, 54–86
   sidebar.py40198%84
src/stratigraphy/text
   description_block_splitter.py70297%24, 139
   extract_text.py29390%19, 53–54
   find_description.py41880%26–34, 111–114
   textblock.py901188%22, 27, 39, 44, 71, 79, 104, 116, 139, 160, 189
src/stratigraphy/util
   dataclasses.py32391%37–39
   predictions.py723453%72, 95–115, 143–187
   util.py341362%41, 69–76, 90–92, 116–117
TOTAL237498958% 

Tests Skipped Failures Errors Time
100 0 💤 0 ❌ 0 🔥 7.634s ⏱️

src/stratigraphy/main.py Outdated Show resolved Hide resolved
src/stratigraphy/groundwater/groundwater_extraction.py Outdated Show resolved Hide resolved
src/stratigraphy/groundwater/groundwater_extraction.py Outdated Show resolved Hide resolved
src/stratigraphy/main.py Outdated Show resolved Hide resolved
src/stratigraphy/main.py Outdated Show resolved Hide resolved
tests/test_coordinate_extraction.py Outdated Show resolved Hide resolved
Copy link
Contributor

@stijnvermeeren-swisstopo stijnvermeeren-swisstopo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@lhaibach lhaibach merged commit 9754ac6 into main Dec 10, 2024
3 checks passed
@lhaibach lhaibach deleted the LGVISIUM-100/extract_depth_ValueError_handling branch December 10, 2024 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants