Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Perseus 5.0 texts #57

Open
whoopsedesy opened this issue Apr 14, 2021 · 5 comments
Open

Update to Perseus 5.0 texts #57

whoopsedesy opened this issue Apr 14, 2021 · 5 comments

Comments

@whoopsedesy
Copy link
Collaborator

We currently source texts from Perseus Hopper a.k.a. Perseus 4.0. The Perseus project is transitioning towards Perseus 5.0, which sources Greek texts from:

The newer texts have advantages:

@whoopsedesy
Copy link
Collaborator Author

whoopsedesy commented Apr 30, 2021

Here's the correspondence between what we're using now and the new texts (table scrolls right). The hymns have been divided into several files, which breaks our assumption that a "work" is a single file.

SEDES label Hopper canonical-greekLit Scaife
aratus Perseus:text:2008.01.0483 tlg0653.tlg001.perseus‑grc1
argonautica Perseus:text:1999.01.0227 tlg0001.tlg001.perseus‑grc2 urn:cts:greekLit:tlg0001.tlg001.perseus‑grc2
callimachushymns Perseus:text:2008.01.0481 tlg0533.tlg015.perseus‑grc1 (Zeus) tlg0533.tlg016.perseus‑grc1 (Apollo) tlg0533.tlg017.perseus‑grc1 (Artemis) tlg0533.tlg018.perseus‑grc1 (Delos) tlg0533.tlg019.perseus‑grc1 (Athena) tlg0533.tlg020.perseus‑grc1 (Demeter)
homerichymns Perseus:text:1999.01.0137 tlg0013.tlg001.perseus‑grc2 tlg0013.tlg002.perseus‑grc2 tlg0013.tlg003.perseus‑grc2 tlg0013.tlg004.perseus‑grc2 tlg0013.tlg005.perseus‑grc2 tlg0013.tlg006.perseus‑grc2 tlg0013.tlg007.perseus‑grc2 tlg0013.tlg008.perseus‑grc2 tlg0013.tlg009.perseus‑grc2 tlg0013.tlg010.perseus‑grc2 tlg0013.tlg011.perseus‑grc2 tlg0013.tlg012.perseus‑grc2 tlg0013.tlg013.perseus‑grc2 tlg0013.tlg014.perseus‑grc2 tlg0013.tlg015.perseus‑grc2 tlg0013.tlg016.perseus‑grc2 tlg0013.tlg017.perseus‑grc2 tlg0013.tlg018.perseus‑grc2 tlg0013.tlg019.perseus‑grc2 tlg0013.tlg020.perseus‑grc2 tlg0013.tlg021.perseus‑grc2 tlg0013.tlg022.perseus‑grc2 tlg0013.tlg023.perseus‑grc2 tlg0013.tlg024.perseus‑grc2 tlg0013.tlg025.perseus‑grc2 tlg0013.tlg026.perseus‑grc2 tlg0013.tlg027.perseus‑grc2 tlg0013.tlg028.perseus‑grc2 tlg0013.tlg029.perseus‑grc2 tlg0013.tlg030.perseus‑grc2 tlg0013.tlg031.perseus‑grc2 tlg0013.tlg032.perseus‑grc2 tlg0013.tlg033.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg001.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg002.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg003.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg004.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg005.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg006.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg007.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg008.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg009.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg010.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg011.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg012.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg013.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg014.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg015.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg016.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg017.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg018.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg019.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg020.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg021.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg022.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg023.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg024.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg025.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg026.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg027.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg028.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg029.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg030.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg031.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg032.perseus‑grc2 urn:cts:greekLit:tlg0013.tlg033.perseus‑grc2
iliad Perseus:text:1999.01.0133 tlg0012.tlg001.perseus‑grc2 urn:cts:greekLit:tlg0012.tlg001.perseus‑grc2
nonnusdionysiaca Perseus:text:2008.01.0485 tlg2045.tlg001.perseus‑grc1
odyssey Perseus:text:1999.01.0135 tlg0012.tlg002.perseus‑grc2 urn:cts:greekLit:tlg0012.tlg002.perseus‑grc2
quintussmyrnaeus Perseus:text:2008.01.0490 tlg2046.tlg001.perseus‑grc1
shield Perseus:text:1999.01.0127 tlg0020.tlg003.perseus‑grc2 urn:cts:greekLit:tlg0020.tlg003.perseus‑grc2
theocritus Perseus:text:1999.01.0228 tlg0005.tlg001.perseus‑grc2
theogony Perseus:text:1999.01.0129 tlg0020.tlg001.perseus‑grc2 urn:cts:greekLit:tlg0020.tlg001.perseus‑grc2
worksanddays Perseus:text:1999.01.0131 tlg0020.tlg002.perseus‑grc2 urn:cts:greekLit:tlg0020.tlg002.perseus‑grc2

@whoopsedesy
Copy link
Collaborator Author

The 5.0 texts use the q element less, preferring to mark quotations using in-text quotation marks. Oddly, they sometimes use

  • U+2018 LEFT SINGLE QUOTATION MARK
  • U+2019 RIGHT SINGLE QUOTATION MARK

and sometimes

  • ʽ U+02BD MODIFIER LETTER REVERSED COMMA
  • ʼ U+02BC MODIFIER LETTER APOSTROPHE

The former pair is standard and expected, but potentially troublesome because U+2019 also stands for an apostrophe, which for us is a word character rather than punctuation. The latter is a strange choice but not actually a problem: we can distinguish U+02BC from apostrophe, and map to U+2018/U+2019 on output. See some discussion at https://github.com/PerseusDL/tei-conversion-tools/wiki/Greek-Betacode-to-Unicode-Transformations#problems ("the decision to use 02BC for apostrophe was an explicit one...") and PerseusDL/canonical-greekLit#1049 (comment).

@whoopsedesy
Copy link
Collaborator Author

Of the texts we use, there is only one that looks to be unsuitable in its current form in 5.0, which is theocritus, tlg0005.tlg001.perseus-grc1.xml. The file is full of ? characters, signifying Beta Code decoding errors. It seems that whatever Beta Code parser was used to make the conversion to Unicode was not able to handle the unusual prepositioning of Beta Code diacritics used in this file. (A matter which we, too encountered, and dealt with locally in c2c4150 and #9 (comment).) It might be possible to repair the Unicode using some heroic heuristics, but it would almost certainly be easier just to re-do the conversion from whatever source was used before.

@whoopsedesy
Copy link
Collaborator Author

Of the texts we use, there is only one that looks to be unsuitable in its current form in 5.0, which is theocritus, tlg0005.tlg001.perseus-grc1.xml.

As of 46cc274f, it looks like the errors in theocritus have been corrected—at least the ? are gone. The commits that changed it are:

@whoopsedesy
Copy link
Collaborator Author

Of the texts we use, there is only one that looks to be unsuitable in its current form in 5.0, which is theocritus, tlg0005.tlg001.perseus-grc1.xml. The file is full of ? characters, signifying Beta Code decoding errors. It seems that whatever Beta Code parser was used to make the conversion to Unicode was not able to handle the unusual prepositioning of Beta Code diacritics used in this file. (A matter which we, too encountered, and dealt with locally in c2c4150 and #9 (comment).) It might be possible to repair the Unicode using some heroic heuristics, but it would almost certainly be easier just to re-do the conversion from whatever source was used before.

Looks like since PerseusDL/canonical-greekLit#1377 (2023-02-07), there's now a good version of Theocritus, tlg0005.tlg001.perseus-grc2.xml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant