-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to Perseus 5.0 texts #57
Comments
The 5.0 texts use the
and sometimes
The former pair is standard and expected, but potentially troublesome because U+2019 also stands for an apostrophe, which for us is a word character rather than punctuation. The latter is a strange choice but not actually a problem: we can distinguish U+02BC from apostrophe, and map to U+2018/U+2019 on output. See some discussion at https://github.com/PerseusDL/tei-conversion-tools/wiki/Greek-Betacode-to-Unicode-Transformations#problems ("the decision to use 02BC for apostrophe was an explicit one...") and PerseusDL/canonical-greekLit#1049 (comment). |
Of the texts we use, there is only one that looks to be unsuitable in its current form in 5.0, which is theocritus, tlg0005.tlg001.perseus-grc1.xml. The file is full of |
As of 46cc274f, it looks like the errors in theocritus have been corrected—at least the |
Looks like since PerseusDL/canonical-greekLit#1377 (2023-02-07), there's now a good version of Theocritus, tlg0005.tlg001.perseus-grc2.xml. |
We currently source texts from Perseus Hopper a.k.a. Perseus 4.0. The Perseus project is transitioning towards Perseus 5.0, which sources Greek texts from:
The newer texts have advantages:
The text was updated successfully, but these errors were encountered: