-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve duplicate line warnings in nonnusdionysiaca.xml #79
Comments
25.528 was caused by an extraneous |
The numbering was wrong, comparing to a book scan. #79
28.83, 28.84, 28.93, 28.94 look like simple errors. Fixed in efeb4ad. https://archive.org/details/dionysiaca02nonnuoft/page/352/mode/1up These line numbers are also duplicates in Perseus 5.0:
|
32.90 is another extraneous empty line. The lines 85–90 are out of ordered and manually numbered, but it a
https://archive.org/details/dionysiaca02nonnuoft/page/450/mode/1up Perseus 5.0 has the same error:
|
The duplicate 40.566 looks like it's just an error, perhaps caused by confusion around out-of-order line numbers and a typographically split line. The second instance of 40.566 in the TEI should have been 40.567 to match the printed version. Fixed in 7574d19. https://archive.org/details/dionysiaca03nonnuoft/page/192/mode/1up Perseus 5.0 has the same error:
Aside, I'll note that the typographic split of 40.567 is not represented in the TEI, as the one at 37.625 is. |
Caused by an apparently accidentally added line break. Compare to a book scan: https://archive.org/details/dionysiaca02nonnuoft/page/450/mode/1up #79
Comparing to a book scan, https://archive.org/details/dionysiaca03nonnuoft/page/192/mode/1up the second instance of line number 566 should have been 567 instead. Perhaps complicated by line 567 being typographically split in the printed version. Though unlike 37.625, here the markup does not indicate the typographical break. #79
The cause of duplicate 37.625 is clear: it's a typographically split line at 37.621 whose two parts were wrongly given separate line numbers 621 and 622. What's called "622" should be the remainder of 621, and the following lines up to 625 shifted down a place. There are two options for resolving it. @sasansom please advise.
In either case, we will also need a change in src/known.py. Currently it has the first part of 37.621 but not the second: 'ὣς φαμένου': # Dion. 37.621
(('ὣς', '+'), ('φαμένου', '--+')), This is causing the second part to be scanned as a four-word all-spondee line:
With option (1), we'll need the second part of 37.621 to be added to src/known.py. With option (2), perhaps automatic scansion, having the whole line to work with, will get it right and we can remove the entry from src/known.py. https://archive.org/details/dionysiaca03nonnuoft/page/78/mode/1up https://archive.org/details/dionysiaca03nonnuoft/page/80/mode/1up Perseus 5.0 has the same error:
|
The consecutive lines that were formerly numbered 380 and 382 were united into one line numbered 380 in commit 2b4737c; however the lines following were not adjusted to match, which later led to warning: Dion.: duplicate line '39.385' This commit adjust the numbers 383, 384, 385 (first instance) to be 382, 383, 384. Note, however, that this disagrees with a book scan, in which the the second half of what here is a united line 380 is numbered 382 (may be an error in the book). #79
The consecutive lines that were formerly numbered 380 and 382 were united into one line numbered 380 in commit 2b4737c; however the lines following were not adjusted to match, which later led to warning: Dion.: duplicate line '39.385' This commit adjust the numbers 383, 384, 385 (first instance) to be 382, 383, 384. Note, however, that this disagrees with a book scan, in which the the second half of what here is a united line 380 is numbered 382 (may be an error in the book). https://archive.org/details/dionysiaca03nonnuoft/page/150/mode/1up #79
39.385 is fixed in commit 57971eb, but it deserves more comment. The cause appears to be the same as with 37.625, an earlier line that is typographically split. The two consecutive lines that have printed markers 380 and 382:
are supposed to be only one metrical line. The two lines were united in commit 2b4737c#diff-c4e168f0263854513ee388716b85afe7d1dc35e0efe75cf605a6f3d4e179d920L17325-R17324; however, the numbering of the following lines were not appropriately adjusted (shifting 383 down to 382, 384 down to 383, 385 down to 384), which is why there was still a duplicate line warning. I guess we are going with the explanation that the printed line number 382 is a misprint that should have been placed one line lower. I made the line number adjustment in 57971eb. I will note that the printed version has a capital Νότος, while we and Perseus 5.0 have a lower-case νότος. @sasansom, is this an error? https://archive.org/details/dionysiaca03nonnuoft/page/150/mode/1up The error is still present in Perseus 5.0. Already noted at #53 ("
|
I have fixed the ones that were clear and simple errors. Besides 37.625, which has two possible solutions, @sasansom I need your attention on 37.568 and 44.145. 37.568I don't see a way to fix 37.568. On page 74, there is a line that follows a printed line number 567 (and is therefore implicitly numbered 568); then on page 78 there is a printed line number 568. The implicitly numbered 568 precedes a line with printed line number 572; but cannot be that the line is rather supposed to be 571, because there is also another line 571 later on, on page 78.
https://archive.org/details/dionysiaca03nonnuoft/page/74/mode/1up https://archive.org/details/dionysiaca03nonnuoft/page/76/mode/1up https://archive.org/details/dionysiaca03nonnuoft/page/78/mode/1up Perseus 5.0 has the same problem. (It also happens to be missing line 37.600, which I guess is unrelated.)
44.145There are 5 unnumbered lines between 140 and 145 in the book scan; then there is no line 147. Perhaps another misprint? https://archive.org/details/dionysiaca03nonnuoft/page/306/mode/1up https://archive.org/details/dionysiaca03nonnuoft/page/308/mode/1up Perseus 5.0 has the same problem:
|
Yes, I think it's an error. I've now capitalized Notos in commit 55fcfb4. |
What's going on here is 1) Rouse, the editor of the Loeb edition, chose to transpose lines 37.568–71 to after 37.601 and before 37.602 (p. 78), but 2) at the top of p. 76 the edition incorrectly printed 572. It should have printed 573, or better yet, have printed 572 on the bottom of p. 74 (which I assume was the original intention before typesetting; it also corresponds to 37.572 in Keydell's edition [1959]). And there is separate problem. The xml file does not include 37.600 anywhere in the text (corresponding to 600 on top of p.78); it seems to have dropped out entirely. I have now added it. I have fixed both in commit b180f53.
Here Rouse follows the edition of Ludwich (Teubner 1911) which transposes line 147 between 138 and 139. The problem is that the Loeb edition prints the verse numbers incorrectly; the sequence [138, 147, 139, 140] on p. 306 should be shifted down one line. I've fixed this in commit 7c1e467. |
I've made sure that the commits resulting from this issue are reflected at #53. |
The table of line totals formerly hardcoded the WORKS from Table 1 of "SEDES: Metrical Position in Greek Hexameter". But there have been changes to the corpus since then that affect line numbering, for example sasansom/sedes#77 sasansom/sedes#79 sasansom/sedes@04dd4a1 Furthermore, Table 1 from the "SEDES" article is produced using an xmlstarlet command running on the source TEI directly counting l and lb elements, not on the derived CSV files. In our notes for the table we remark that this is because duplicate line numbers cause the counts to come out too low: For future reference: $ (echo "work,lines"; for a in corpus/*.xml; do echo "$a,$(xmlstarlet sel -t -m '//l' -v '"l"' -n -t -m '//lb' -v '"lb"' -n "$a" | wc -l)"; done) > corpus.csv > x <- read.csv("corpus.csv") > sum(x$lines) [1] 73098 > summary(x$lines) Min. 1st Qu. Median Mean 3rd Qu. Max. 479 1017 2434 6092 9628 21356 --- Table 1 numbers checked 2022-09-17, sedes commit cf795ef740. --- > x <- bind_rows(map_dfr(Sys.glob("corpus/*.csv"), read_csv, col_types = cols(line_n = col_character(), book_n = col_character()))) > x %>% group_by(work) %>% summarize(n = n()) NB the line counts you get from counting distinct line numbers in the CSV are slightly different (smaller) from what you get from xmlstarlet, because of duplicated line numbers. > x %>% select(work, book_n, line_n) %>% unique %>% nrow [1] 72954 > x %>% select(work, book_n, line_n) %>% unique %>% group_by(work) %>% summarize(n = n()) In this repository I've started adding a workaround for the duplicate line numbers, counting up a line whenever word_n fails to increase with the same work, book_n, and line_n in input order. But even with that, the automatically determined counts for Callim.Hymn and Q.S. are 1 smaller than they used to be, and unlike Dion. and Theoc., we have not made changes to those texts that should affect line count totals. I am planning to look at those more closely, but for now, go ahead with the automatically computed line numbers, because that's what all our percentages etc. are based on. If I repeat the xmlstarlet calculation with current SEDES files (605a27b3af22089379aad22ba96edf113970a7b0), the only change I get is 3 fewer lines in Dion. Using the automatically determined line numbers takes it down another 99 lines across 4 works. work old_num_lines redo_old_num_lines diff1 new_num_lines diff2 <chr> <dbl> <dbl> <dbl> <int> <dbl> 1 Phaen. 1155 1155 0 1155 0 2 Argon. 5834 5834 0 5834 0 3 Callim.Hymn 941 941 0 940 -1 4 Hom.Hymn 2342 2342 0 2342 0 5 Il. 15683 15683 0 15683 0 6 Dion. 21356 21353 -3 21259 -97 7 Od. 12107 12107 0 12107 0 8 Q.S. 8801 8801 0 8800 -1 9 Sh. 479 479 0 479 0 10 Theoc. 2527 2527 0 2524 -3 11 Theog. 1042 1042 0 1042 0 12 W.D. 831 831 0 831 0 13 total 73098 73095 -3 72996 -102
Cf. #77
As of a58e572, there are these warnings:
The text was updated successfully, but these errors were encountered: