-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
版面分析漏掉的区域无法解析到结果 #1441
Labels
bug
Something isn't working
Comments
原则上layout的结果是后续所有pipeline的基础,是可信的,pymu的block信息是不可信的,所以不会通过pymu的block信息对layout的block信息进行修复。这种case只有通过迭代layout模型来修复。 |
能提供一下可以复现的样本吗 |
@ufxelv80 你这个还不一样,上面那个是layout漏检,你这个是元素被识别成页脚。一些比较靠近页面上边缘或者下边缘的元素是会被当成页眉页脚丢弃的。 |
怎么给你呢 |
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description of the bug | 错误描述
在版面分析中存在漏识别区域,这样在后续使用pymupdf进行字符填充的时候这些内容就会被舍弃
How to reproduce the bug | 如何复现
是否可以用pymupdf的get_blocks作为版面分析结果的校验补充呢,补充上版面分析漏掉的文字区域。
Operating system | 操作系统
Linux
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.10.x
Device mode | 设备模式
cuda
The text was updated successfully, but these errors were encountered: