Close LGVISIUM-63: Extraction of the groundwater logo using computer vision #83

dcleres · 2024-09-20T12:06:33Z

Addition of template matching to the code. This should make it possible to find groundwater information without the respective keywords.

In this PR, I also tried to remove the keywords that were not used and to remove them if needed. Furthermore, I added a list of FP keywords. This list contains keywords that were leading to false positives in the detections.

…es-dataextraction into LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision

github-actions · 2024-09-20T12:07:47Z

Coverage Report

File	Stmts	Miss	Cover	Missing
src/stratigraphy
__init__.py	8	1	88%	11
extract.py	186	186	0%	3–483
get_files.py	19	19	0%	3–47
main.py	117	117	0%	3–308
src/stratigraphy/data_extractor
data_extractor.py	57	3	95%	33, 66, 103
src/stratigraphy/depthcolumn
boundarydepthcolumnvalidator.py	41	20	51%	47, 57, 60, 81–84, 110–128, 140–149
depthcolumn.py	194	64	67%	25, 29, 50, 56, 59–60, 84, 87, 94, 101, 109–110, 120, 137–153, 191, 228, 247–255, 266, 271, 278, 309, 314–321, 336–337, 380–422
depthcolumnentry.py	28	6	79%	17, 21, 36, 39, 56, 65
find_depth_columns.py	106	19	82%	42–43, 73, 86, 180–181, 225–245
src/stratigraphy/layer
layer_identifier_column.py	74	52	30%	16–17, 20, 28, 43, 47, 51, 59–63, 66, 74, 91–96, 99, 112, 125–126, 148–158, 172–199
src/stratigraphy/lines
geometric_line_utilities.py	86	2	98%	81, 131
line.py	51	4	92%	25, 50, 60, 110
linesquadtree.py	46	1	98%	75
src/stratigraphy/metadata
coordinate_extraction.py	108	5	95%	30, 64, 94–95, 107
src/stratigraphy/text
description_block_splitter.py	70	2	97%	24, 139
extract_text.py	29	3	90%	19, 53–54
find_description.py	64	28	56%	27–35, 50–63, 79–95, 172–175
textblock.py	80	9	89%	28, 56, 64, 89, 101, 124, 145, 154, 183
src/stratigraphy/util
dataclasses.py	32	3	91%	37–39
interval.py	104	55	47%	29–32, 37–40, 46, 52, 56, 66–68, 107–153, 174, 180–196
predictions.py	107	107	0%	3–282
util.py	39	17	56%	41, 69–76, 90–92, 116–117, 129–133
TOTAL	1652	723	56%

Tests	Skipped	Failures	Errors	Time
82	0 💤	0 ❌	0 🔥	6.265s ⏱️

…es-dataextraction into LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision

stijnvermeeren-swisstopo

I would not enable this functionality by default at the moment, as it is just too slow and does not lead to an improvement that is significant enough to justify this slow-down. So maybe we should add a config parameter that can be used to decide whether to apply template matching or not. Then we can keep the code for now, and create a follow-up ticket to look into ways to optimize it.

One potential idea would be to look into using the OpenCV2 implementation of template matching instead of the scikit-learn one. Some people seem to claim that the former is more performant (e.g. https://www.reddit.com/r/opencv/comments/g8kdcs/question_the_speed_of_matchtemplate/).

src/stratigraphy/groundwater/groundwater_extraction.py

src/stratigraphy/groundwater/utility.py

stijnvermeeren-swisstopo · 2024-10-02T14:14:24Z

src/stratigraphy/groundwater/groundwater_extraction.py

+    search_left_factor: float = 3  # NOTE: check files 267125334-bp.pdf, 267125338-bp.pdf, and 267125339-bp.pdf if this
+    # value is too high, as it might lead to false positives


Yes, this, in combination with the new search_above_factor, indeed seems to lead to too many false positives (see e.g. 267125029-bp.pdf). But maybe the ongoing work in https://jira.swisstopo.ch/browse/LGVISIUM-77 will already make this more robust again?

Why was it necessary exactly to increase this value? I don't really understand what the files 267125334-bp.pdf, 267125338-bp.pdf, and 267125339-bp.pdf have to do with it.

The issue I was facing with the 267125334-bp.pdf, 267125338-bp.pdf, and 267125339-bp.pdf bore profiles was that False Positives were generated if the left search factor was too large. In these profiles, the algorithm would find the depth column and extract data.

I think the best option performance-wise would be to use the default values from the main branch.

…es-dataextraction into LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision

dcleres · 2024-10-14T12:31:06Z

I addressed the comments you raised @stijnvermeeren-swisstopo . Thank you very much for the review. I added the possibility of running the template matching on demand by editing the environment variable IS_SEARCHING_GROUNDWATER_ILLUSTRATION.

Even when not running the template matching, the metrics were improved:

main branch was run: vaunted-mink-342

pyproject.toml

src/stratigraphy/data_extractor/data_extractor.py

src/stratigraphy/groundwater/groundwater_extraction.py

dcleres · 2024-10-14T14:12:09Z

src/stratigraphy/groundwater/groundwater_extraction.py

+                        lines, page_number, terrain_elevation
+                    )
+                    if found_groundwater:
+                        logger.info("Confidence list: %s", confidence_list)


Is this helpful logging according to you, @stijnvermeeren-swisstopo? In a previous iteration, we removed the logging in case no groundwater was found. We can also remove the logging when groundwater is found, as I mostly use it for debugging purposes.

ng PR

…e rest of the groundwater detection

dcleres · 2024-10-14T17:01:51Z

@stijnvermeeren-swisstopo I do believe I implemented all the changes we discussed today. Main change, the template matching is now in an independent and separate file.

pyproject.toml

dcleres · 2024-10-17T09:31:16Z

@stijnvermeeren-swisstopo thank you for your review. I add the groundwater_illustration_matching to the toml installation script.

pyproject.toml

stijnvermeeren-swisstopo

LGTM!

dcleres added 6 commits September 18, 2024 15:48

Added illustration matcing to the extraction of groundwater data

4d82070

Merge branch 'main' of https://github.com/swisstopo/swissgeol-borehol…

ac42a5d

…es-dataextraction into LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision

Minor fix for the page and page number confusion

25b4032

Merge branch 'main' of https://github.com/swisstopo/swissgeol-borehol…

ffd5902

…es-dataextraction into LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision

Bug fixes and case by case improvements

e786895

Uploaded latest changes

283435a

dcleres requested a review from stijnvermeeren-swisstopo September 20, 2024 12:06

dcleres self-assigned this Sep 20, 2024

dcleres changed the title ~~Lgvisium 63 extraction of the groundwater logo using computer vision~~ Close LGVISIUM-63: Extraction of the groundwater logo using computer vision Sep 20, 2024

dcleres added 3 commits September 23, 2024 11:14

Fixed the removal of areas that have already been matched

3812178

Merge branch 'main' of https://github.com/swisstopo/swissgeol-borehol…

80dae81

…es-dataextraction into LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision

Added ski to the dependencies

bee7c57

stijnvermeeren-swisstopo requested changes Oct 2, 2024

View reviewed changes

stijnvermeeren-swisstopo reviewed Oct 2, 2024

View reviewed changes

dcleres closed this in ef84413 Oct 7, 2024

dcleres reopened this Oct 8, 2024

dcleres added 4 commits October 14, 2024 11:18

Addressed code review comments

d28489d

Merge branch 'main' of https://github.com/swisstopo/swissgeol-borehol…

7791dda

…es-dataextraction into LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision

Updated the seach factors

0cb47a9

Reverted the extract depth function back to its implementation in main

a4a97ed

dcleres marked this pull request as ready for review October 14, 2024 12:31

stijnvermeeren-swisstopo requested changes Oct 14, 2024

View reviewed changes

code review

45bda63

dcleres commented Oct 14, 2024

View reviewed changes

dcleres added 2 commits October 14, 2024 16:41

Added changes related to the comments required duri

2a39943

ng PR

Moved the template matching to a specific file to separate it from th…

8ceeeb2

…e rest of the groundwater detection

dcleres requested a review from stijnvermeeren-swisstopo October 14, 2024 17:01

stijnvermeeren-swisstopo requested changes Oct 16, 2024

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

Address suggestion made during PR

f0c6838

dcleres requested a review from stijnvermeeren-swisstopo October 17, 2024 09:31

stijnvermeeren-swisstopo requested changes Oct 17, 2024

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

Changed underscore with hyphens in the toml file

ed6b6fb

stijnvermeeren-swisstopo approved these changes Oct 17, 2024

View reviewed changes

dcleres merged commit 204a50d into main Oct 17, 2024
3 checks passed

dcleres deleted the LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision branch October 21, 2024 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Close LGVISIUM-63: Extraction of the groundwater logo using computer vision #83

Close LGVISIUM-63: Extraction of the groundwater logo using computer vision #83

dcleres commented Sep 20, 2024

github-actions bot commented Sep 20, 2024 •

edited

Loading

stijnvermeeren-swisstopo left a comment

stijnvermeeren-swisstopo Oct 2, 2024 •

edited

Loading

dcleres Oct 14, 2024

dcleres commented Oct 14, 2024

dcleres Oct 14, 2024

dcleres commented Oct 14, 2024

dcleres commented Oct 17, 2024

stijnvermeeren-swisstopo left a comment

		search_left_factor: float = 3 # NOTE: check files 267125334-bp.pdf, 267125338-bp.pdf, and 267125339-bp.pdf if this
		# value is too high, as it might lead to false positives

Close LGVISIUM-63: Extraction of the groundwater logo using computer vision #83

Close LGVISIUM-63: Extraction of the groundwater logo using computer vision #83

Conversation

dcleres commented Sep 20, 2024

github-actions bot commented Sep 20, 2024 • edited Loading

stijnvermeeren-swisstopo left a comment

Choose a reason for hiding this comment

stijnvermeeren-swisstopo Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

dcleres Oct 14, 2024

Choose a reason for hiding this comment

dcleres commented Oct 14, 2024

dcleres Oct 14, 2024

Choose a reason for hiding this comment

dcleres commented Oct 14, 2024

dcleres commented Oct 17, 2024

stijnvermeeren-swisstopo left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 20, 2024 •

edited

Loading

stijnvermeeren-swisstopo Oct 2, 2024 •

edited

Loading