Skip to content

Commit

Permalink
Minor updates; improve docstrings.
Browse files Browse the repository at this point in the history
  • Loading branch information
redur committed May 27, 2024
1 parent 459c5d9 commit ce5d453
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 2 deletions.
2 changes: 1 addition & 1 deletion src/stratigraphy/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ def start_pipeline(
if page_index > 0:
layer_predictions = remove_duplicate_layers(
doc[page_index - 1],
doc[page_index],
page,
predictions[filename][f"page_{page_number - 1}"]["layers"],
layer_predictions,
matching_params["img_template_probability_threshold"],
Expand Down
10 changes: 9 additions & 1 deletion src/stratigraphy/util/duplicate_detection.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ def remove_duplicate_layers(
) -> list[dict]:
"""Remove duplicate layers from the current page based on the layers of the previous page.
We check if a layer on the current page is present on the previous page. If we have 3 consecutive layers that are
not duplicates, we assume that there is no further overlap between the pages and stop the search. If we find a
duplicate, all layers up to including the duplicate layer are removed.
If the page contains a depth column, we compare the depth intervals and the material description to determine
duplicate layers. If there is no depth column, we use template matching to compare the layers.
Expand All @@ -32,7 +36,7 @@ def remove_duplicate_layers(
img_template_probability_threshold (float): The threshold for the template matching probability
Returns:
list[dict]: _description_
list[dict]: The layers of the current page without duplicates.
"""
sorted_layers = sorted(current_layers, key=lambda x: x["material_description"]["rect"][1])
first_non_duplicated_layer_index = 0
Expand Down Expand Up @@ -96,6 +100,10 @@ def check_duplicate_layer_by_template_matching(
) -> bool:
"""Check if the current layer is a duplicate of a layer on the previous page by using template matching.
This is done by extracting an image of the layer and check if that image is present in the previous page
by applying template matching onto the previous page. This checks if the image of the current layer is present
in the previous page.
Args:
previous_page (fitz.Page): The previous page.
current_page (fitz.Page): The current page.
Expand Down

0 comments on commit ce5d453

Please sign in to comment.