-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close LGVISIUM-63: Extraction of the groundwater logo using computer vision #83
Close LGVISIUM-63: Extraction of the groundwater logo using computer vision #83
Conversation
…es-dataextraction into LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision
…es-dataextraction into LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision
…es-dataextraction into LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not enable this functionality by default at the moment, as it is just too slow and does not lead to an improvement that is significant enough to justify this slow-down. So maybe we should add a config parameter that can be used to decide whether to apply template matching or not. Then we can keep the code for now, and create a follow-up ticket to look into ways to optimize it.
One potential idea would be to look into using the OpenCV2 implementation of template matching instead of the scikit-learn one. Some people seem to claim that the former is more performant (e.g. https://www.reddit.com/r/opencv/comments/g8kdcs/question_the_speed_of_matchtemplate/).
search_left_factor: float = 3 # NOTE: check files 267125334-bp.pdf, 267125338-bp.pdf, and 267125339-bp.pdf if this | ||
# value is too high, as it might lead to false positives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this, in combination with the new search_above_factor
, indeed seems to lead to too many false positives (see e.g. 267125029-bp.pdf). But maybe the ongoing work in https://jira.swisstopo.ch/browse/LGVISIUM-77 will already make this more robust again?
Why was it necessary exactly to increase this value? I don't really understand what the files 267125334-bp.pdf, 267125338-bp.pdf, and 267125339-bp.pdf have to do with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I was facing with the 267125334-bp.pdf, 267125338-bp.pdf, and 267125339-bp.pdf bore profiles was that False Positives were generated if the left search factor was too large. In these profiles, the algorithm would find the depth column and extract data.
I think the best option performance-wise would be to use the default values from the main
branch.
…es-dataextraction into LGVISIUM-63-Extraction-of-the-Groundwater-logo-using-Computer-Vision
I addressed the comments you raised @stijnvermeeren-swisstopo . Thank you very much for the review. I added the possibility of running the template matching on demand by editing the environment variable Even when not running the template matching, the metrics were improved: main branch was run: |
lines, page_number, terrain_elevation | ||
) | ||
if found_groundwater: | ||
logger.info("Confidence list: %s", confidence_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this helpful logging according to you, @stijnvermeeren-swisstopo? In a previous iteration, we removed the logging in case no groundwater was found. We can also remove the logging when groundwater is found, as I mostly use it for debugging purposes.
…e rest of the groundwater detection
@stijnvermeeren-swisstopo I do believe I implemented all the changes we discussed today. Main change, the template matching is now in an independent and separate file. |
@stijnvermeeren-swisstopo thank you for your review. I add the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Addition of template matching to the code. This should make it possible to find groundwater information without the respective keywords.
In this PR, I also tried to remove the keywords that were not used and to remove them if needed. Furthermore, I added a list of FP keywords. This list contains keywords that were leading to false positives in the detections.