Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close #LGVISIUM-77: Wrong elevation value selected within bounding box #91

Merged

Conversation

dcleres
Copy link
Contributor

@dcleres dcleres commented Oct 3, 2024

Started to sort vertically the different text lines matched around the matched keyword. By sorting, it was possible to test which was the closest to the keyword first and thus reduce the number of false positives or errors in elevation and groundwater detection.

@stijnvermeeren-swisstopo In today's stand-up, we discussed the possibility of vertically and horizontally sorting the text. Currently, all the issues I found are solved with this code in combination with the improvements you made in the OCR processing. However, if you wish, I can also try this technique.

Screenshot 2024-10-03 at 09 30 24

The metrics from the main branch are in silent-quail-641, and the performance of this branch is ambitious-mule-939

@dcleres dcleres self-assigned this Oct 3, 2024
Copy link

github-actions bot commented Oct 3, 2024

Coverage

Coverage Report
FileStmtsMissCoverMissing
src/stratigraphy
   __init__.py8188%11
   extract.py1861860%3–483
   get_files.py19190%3–47
   main.py1191190%3–310
src/stratigraphy/data_extractor
   data_extractor.py52394%32, 62, 98
src/stratigraphy/depthcolumn
   boundarydepthcolumnvalidator.py412051%47, 57, 60, 81–84, 110–128, 140–149
   depthcolumn.py1946467%25, 29, 50, 56, 59–60, 84, 87, 94, 101, 109–110, 120, 137–153, 191, 228, 247–255, 266, 271, 278, 309, 314–321, 336–337, 380–422
   depthcolumnentry.py28679%17, 21, 36, 39, 56, 65
   find_depth_columns.py1061982%42–43, 73, 86, 180–181, 225–245
src/stratigraphy/layer
   layer_identifier_column.py745230%16–17, 20, 28, 43, 47, 51, 59–63, 66, 74, 91–96, 99, 112, 125–126, 148–158, 172–199
src/stratigraphy/lines
   geometric_line_utilities.py86298%81, 131
   line.py51492%25, 50, 60, 110
   linesquadtree.py46198%75
src/stratigraphy/metadata
   coordinate_extraction.py108595%30, 64, 94–95, 107
src/stratigraphy/text
   description_block_splitter.py70297%24, 139
   extract_text.py29390%19, 53–54
   find_description.py642856%27–35, 50–63, 79–95, 172–175
   textblock.py80989%28, 56, 64, 89, 101, 124, 145, 154, 183
src/stratigraphy/util
   dataclasses.py32391%37–39
   interval.py1045547%29–32, 37–40, 46, 52, 56, 66–68, 107–153, 174, 180–196
   predictions.py1071070%3–282
   util.py391756%41, 69–76, 90–92, 116–117, 129–133
TOTAL164372556% 

Tests Skipped Failures Errors Time
82 0 💤 0 ❌ 0 🔥 6.478s ⏱️

@dcleres dcleres marked this pull request as draft October 3, 2024 08:02
@dcleres dcleres marked this pull request as ready for review October 3, 2024 13:31
Copy link
Contributor

@stijnvermeeren-swisstopo stijnvermeeren-swisstopo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to https://jira.swisstopo.ch/browse/LGVISIUM-77 to describe the decision to only sort by vertical distance, as this already improves the results sufficiently, and that we therefore did not yet test the idea of looking horizontally and vertically (but not diagonally) from the key line.

src/stratigraphy/data_extractor/data_extractor.py Outdated Show resolved Hide resolved
@dcleres
Copy link
Contributor Author

dcleres commented Oct 9, 2024

@stijnvermeeren-swisstopo I added a description on the rational of only sorting vertically to Jira and how this could be improved.

@dcleres dcleres changed the title Close LGVISIUM-77: Wrong elevation value selected within bounding box Close #LGVISIUM-77: Wrong elevation value selected within bounding box Oct 9, 2024
Copy link
Contributor

@stijnvermeeren-swisstopo stijnvermeeren-swisstopo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comment on the Jira ticket. LGTM!

@dcleres dcleres merged commit 1b2bb80 into main Oct 10, 2024
3 checks passed
@dcleres dcleres deleted the LGVISIUM-77-Wrong-elevation-value-selected-within-bounding-box branch October 23, 2024 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants