-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close LGVISIUM-52: Remove grouping by page #70
Conversation
e6059c1
to
29fa7b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed two commits:
- one fixing 2 TODOs in predictions.py (many thanks for highlighting those!)
- one addressing the last point in the JIRA ticket:
- While we are at it, can we remove the postprocessing (lowercase, removing whitespace) from the "text" field of the material description (e.g. "text": "kiesmitvielsandwenigsiltundtonbraungrau")? It would make more sense to me if this postprocessing is only done for the purposes of evaluation / comparing with ground truth, but the predictions.json file should actually contain the original text from the document without postprocessing (as is already the case for the individual lines).
Remaining points from the ticket:
- Importing the extracted data in Label Studio still works
- Did you check this @dcleres ?
- README documentation is updated
- The README currently contains a JSON example with the old structure
I'm not a big fan of the fact that we had to add page_number=1
in so many unit tests that are not really concerned with page number logic at all. However, I don't really see an immediate solution to this. Unless you see an easy way to avoid this, I'm happy to accept it like this.
The other change requests that I've added are all pretty minor, I think.
I found during the review that the file: |
Thank you for your comment regarding the PR. I resolved all the issues you highlighted which are relevant to this part of the code. The changes required to the label studio repository will be merged into the respective repository.
Regarding this comment, I fully agree with you, but I also have not found a fundamentally better way to solve this issue. I edited the tests and added a constant called While doing this, I also realized that we are not performing any tests on files with more than one page. @stijnvermeeren-swisstopo, should a test be added for this as well? |
129a8e2
to
e8221f6
Compare
That would certainly be useful as well, but it's not a top priority. If you want to work on that, I would certainly do it in a separate PR, so that we can close this one already now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a minor simplification of the page dimensions code (4404eff).
Everything looks good to me now.
Remove the page based logic in the code. The
prediction.json
has now the following shape:Behind the scenes, in the code, the page-based logic was removed, and a File-based logic was introduced.
Additional features:
launch.json
file that makes it possible to run the borehole extraction in debug mode.