Skip to content

Releases: VikParuchuri/marker

Flatten PDF, fix page separators, fix torch/transformers bugs

25 Oct 17:04
b2cae2e
Compare
Choose a tag to compare
  • Fix issues with transformers 4.46 and torch 2.5
  • Improve page separators - they now appear that the start of the page, and show the page number
  • Flatten form fields into the PDF before extracting markdown

Fix table bug

24 Oct 17:55
1b4b413
Compare
Choose a tag to compare
  • Fix bug that caused conversion to fail when start_page was set and the document had tables

Undo threads

23 Oct 17:37
8e28b05
Compare
Choose a tag to compare

Threads cause issues on a small % of devices

Speedups, bug fixes

23 Oct 16:30
6b25f06
Compare
Choose a tag to compare
  • Fix some edge case OCR bugs
  • ~20% end to end speedup by improving layout and text detection

Fix OCR bugs

23 Oct 02:12
2f3f0d7
Compare
Choose a tag to compare
  • Fix bbox issue with OCR and resizing
  • Fix issue with layout bboxes missing after OCR

Fix misc bugs

22 Oct 22:27
189d660
Compare
Choose a tag to compare
  • Ensure we don't have 0 area table boxes
  • Ensure fullymergedblock gets a valid input

Fix layout bugs

22 Oct 20:52
6bee852
Compare
Choose a tag to compare
  • Improve layout, which improves output quality
  • Fix header level detection bugs

Fix OOM errors

21 Oct 13:52
93a3ca6
Compare
Choose a tag to compare
  • Add batch size for table rec model to avoid OOM
  • Enable configuring batch size
  • Fix error with debugging

Bugfixes, output quality improvement

18 Oct 19:48
361f9b5
Compare
Choose a tag to compare
  • Fix MPS bug with torch 2.5
  • Fix heading bug with zero line blocks
  • Improve output quality when visual boxes and text boxes are offset

Better tables, improved output quality, header levels

17 Oct 22:37
ea845fd
Compare
Choose a tag to compare

Tables!

  • Integrate custom table model for better table rendering - this uses a new state of the art open table model

Markdown output

  • Adjust block detection to improve markdown output globally
  • Assign layout labels to blocks in a better way - will improve quality globally
  • Better line spacing in markdown output
  • Push footnotes to end of page

Header levels

  • Add detection for header levels like #, ##, etc.
  • Add computed table of contents

Bugfixes/misc

  • Fix bug with pagination not working
  • Much better debugging with debug image output
  • Python 3.13 support