Skip to content

Commit

Permalink
Merge pull request #90 from swisstopo/feature-README.md-update
Browse files Browse the repository at this point in the history
Update Run data extraction instruction
  • Loading branch information
lhaibach authored Oct 2, 2024
2 parents 8ecf055 + 4199e81 commit b164bb8
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,13 +103,14 @@ To execute the data extraction pipeline, follow these steps:

The main script for the extraction pipeline is located at `src/stratigraphy/main.py`. A cli command is created to run this script.

Run `boreholes-extract-all` to run the main extraction script. With the default options, the command will source all PDFs from the `data/Benchmark` directory and create PNG files in the `data/Benchmark/extract` directory.
Run `boreholes-extract-all` to run the main extraction script. You need to specify the input directory or a single PDF file using the `-i` or `--input-directory` flag.
The script will source all PDFs from the specified directory and create PNG files in the `data/output/draw` directory.

Use `boreholes-extract-all --help` to see all options for the extraction script.

4. **Check the results**

Once the script has finished running, you can check the results in the `data/Benchmark/extract` directory. The result is a `predictions.json` file as well as a png file for each page of each PDF in the `data/Benchmark` directory.
Once the script has finished running, you can check the results in the `data/output/draw` directory. The result is a `predictions.json` file as well as a png file for each page of each PDF in the specified input directory.

### Output Structure
The `predictions.json` file contains the results of a data extraction process from PDF files. Each key in the JSON object is the name of a PDF file, and the value is a list of extracted items in a dictionary like object. The extracted items for now are the material descriptions in their correct order (given by their depths).
Expand Down

1 comment on commit b164bb8

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage

Coverage Report
FileStmtsMissCoverMissing
src/stratigraphy
   __init__.py8188%11
   extract.py1861860%3–483
   get_files.py19190%3–47
   main.py1191190%3–310
src/stratigraphy/data_extractor
   data_extractor.py50394%32, 62, 98
src/stratigraphy/depthcolumn
   boundarydepthcolumnvalidator.py412051%47, 57, 60, 81–84, 110–128, 140–149
   depthcolumn.py1946467%25, 29, 50, 56, 59–60, 84, 87, 94, 101, 109–110, 120, 137–153, 191, 228, 247–255, 266, 271, 278, 309, 314–321, 336–337, 380–422
   depthcolumnentry.py28679%17, 21, 36, 39, 56, 65
   find_depth_columns.py1061982%42–43, 73, 86, 180–181, 225–245
src/stratigraphy/layer
   layer_identifier_column.py745230%16–17, 20, 28, 43, 47, 51, 59–63, 66, 74, 91–96, 99, 112, 125–126, 148–158, 172–199
src/stratigraphy/lines
   geometric_line_utilities.py86298%81, 131
   line.py51492%25, 50, 60, 110
   linesquadtree.py46198%75
src/stratigraphy/metadata
   coordinate_extraction.py108595%30, 64, 83–84, 96
src/stratigraphy/text
   description_block_splitter.py70297%24, 139
   extract_text.py29390%19, 53–54
   find_description.py642856%27–35, 50–63, 79–95, 172–175
   textblock.py80989%28, 56, 64, 89, 101, 124, 145, 154, 183
src/stratigraphy/util
   dataclasses.py32391%37–39
   interval.py1045547%29–32, 37–40, 46, 52, 56, 66–68, 107–153, 174, 180–196
   predictions.py1071070%3–282
   util.py391756%41, 69–76, 90–92, 116–117, 129–133
TOTAL164172556% 

Tests Skipped Failures Errors Time
79 0 💤 0 ❌ 0 🔥 5.849s ⏱️

Please sign in to comment.