Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use this repository #1

Open
MiracleDx opened this issue May 13, 2022 · 1 comment
Open

How to use this repository #1

MiracleDx opened this issue May 13, 2022 · 1 comment

Comments

@MiracleDx
Copy link

我有一些例如:①②③④⑤这种的符号,该如何通过您的代码进行训练呢

@gumblex
Copy link
Owner

gumblex commented May 13, 2022

  1. 通过自动或手动的方式把你的其他字符穿插进 langdata/chi_sim/chi_sim.training_text
  2. 收集字体文件,放到 fonts 文件夹
  3. 修改 langdata/chi_sim/chi_sim.fontlist.txt (传统模型)或 chi_sim.fontlist_lstm.txt (LSTM)。可以用 text2image --text=langdata/chi_sim/chi_sim.training_text --outputbase=chi_sim_test --fonts_dir=fonts --find_fonts --min_coverage=0.9 --render_per_font=false 获得字体名称。
  4. 在其他地方新建文件夹(至少有几十G空余空间),设置环境变量 TESSDATA_PREFIX (tesseract 数据位置)和 PATH (tesseract 命令行位置),运行 python3 <本项目文件夹>/configure.py
  5. 运行 make -j10(根据CPU核数设置)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants