-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close #LGVISIUM-83: Extract coordinates with non integer values #88
Close #LGVISIUM-83: Extract coordinates with non integer values #88
Conversation
@@ -138,10 +138,10 @@ def draw_metadata( | |||
""" | |||
# TODO associate correctness with the extracted coordinates in a better way | |||
coordinate_color = "green" if is_coordinate_correct else "red" | |||
coordinate_rect = fitz.Rect([5, 5, 200, 25]) | |||
coordinate_rect = fitz.Rect([5, 5, 250, 30]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to increase the size of the boxes; otherwise, the coordinates would not be printed because the box was too small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having looked at it a little more, I think that I do have a solution that keeps the performance status-quo without requiring the additional preprocessing hack. I would use the following regex:
r"(?:([12])[\.\s'‘’]{0,2})?(\d{3})[\.\s'‘’]{0,2}(\d{3})(?:\.(\d{1,}))?"
The difference is that IF we extract decimal digits, THEN the decimal point is required. The current regex has the decimal point always as optional, regardless of whether we have some decimal digits or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Made it possible to extract digits from the coordinates by adapting the regex code. Adapted the tests to this new case and uploaded a new test file to test these cases.