Releases: VikParuchuri/marker
Releases · VikParuchuri/marker
Flatten PDF, fix page separators, fix torch/transformers bugs
- Fix issues with transformers 4.46 and torch 2.5
- Improve page separators - they now appear that the start of the page, and show the page number
- Flatten form fields into the PDF before extracting markdown
Fix table bug
- Fix bug that caused conversion to fail when start_page was set and the document had tables
Undo threads
Threads cause issues on a small % of devices
Speedups, bug fixes
- Fix some edge case OCR bugs
- ~20% end to end speedup by improving layout and text detection
Fix OCR bugs
- Fix bbox issue with OCR and resizing
- Fix issue with layout bboxes missing after OCR
Fix misc bugs
- Ensure we don't have 0 area table boxes
- Ensure fullymergedblock gets a valid input
Fix layout bugs
- Improve layout, which improves output quality
- Fix header level detection bugs
Fix OOM errors
- Add batch size for table rec model to avoid OOM
- Enable configuring batch size
- Fix error with debugging
Bugfixes, output quality improvement
- Fix MPS bug with torch 2.5
- Fix heading bug with zero line blocks
- Improve output quality when visual boxes and text boxes are offset
Better tables, improved output quality, header levels
Tables!
- Integrate custom table model for better table rendering - this uses a new state of the art open table model
Markdown output
- Adjust block detection to improve markdown output globally
- Assign layout labels to blocks in a better way - will improve quality globally
- Better line spacing in markdown output
- Push footnotes to end of page
Header levels
- Add detection for header levels like #, ##, etc.
- Add computed table of contents
Bugfixes/misc
- Fix bug with pagination not working
- Much better debugging with debug image output
- Python 3.13 support