Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve duplicate line warnings in nonnusdionysiaca.xml #79

Closed
whoopsedesy opened this issue Jan 2, 2023 · 11 comments
Closed

Resolve duplicate line warnings in nonnusdionysiaca.xml #79

whoopsedesy opened this issue Jan 2, 2023 · 11 comments

Comments

@whoopsedesy
Copy link
Collaborator

whoopsedesy commented Jan 2, 2023

Cf. #77

As of a58e572, there are these warnings:

warning: Dion.: duplicate line '25.528'
warning: Dion.: duplicate line '28.83'
warning: Dion.: duplicate line '28.84'
warning: Dion.: duplicate line '28.93'
warning: Dion.: duplicate line '28.94'
warning: Dion.: duplicate line '32.90'
warning: Dion.: duplicate line '37.568'
warning: Dion.: duplicate line '37.625'
warning: Dion.: duplicate line '39.385'
warning: Dion.: duplicate line '40.566'
warning: Dion.: duplicate line '44.145'
@whoopsedesy
Copy link
Collaborator Author

25.528 was caused by an extraneous <lb /> empty line, fixed in 2869243.

whoopsedesy pushed a commit that referenced this issue Jan 2, 2023
The numbering was wrong, comparing to a book scan.
#79
@whoopsedesy
Copy link
Collaborator Author

28.83, 28.84, 28.93, 28.94 look like simple errors. Fixed in efeb4ad.

https://archive.org/details/dionysiaca02nonnuoft/page/352/mode/1up
dionysiaca02nonnuoft_0368

These line numbers are also duplicates in Perseus 5.0:

https://github.com/PerseusDL/canonical-greekLit/blob/812f91f083f88cf789b37be89b22606ca6f27f6c/data/tlg2045/tlg001/tlg2045.tlg001.perseus-grc1.xml

<l n="82">Δεξιόχου δὲ φονῆα καλέσσατο θυιάδι φωνῇ,</l>
<l n="83">λοίγιον ὑβριστῆρι χέων ἔπος ἀνθερεῶνι·</l>
<l n="84">‘στῆθι, κύων, μὴ φεῦγε, Κορύμβασε, καί σε διδάξω,</l>
<l n="83">οἷοι ἀκοντιστῆρες ὀπάονές εἰσι Λυαίου. </l>
<l n="84">ὑμέας εἰς Φρυγίην ληίσσομαι, ἄστεα δʼ Ἰνδῶν</l>
<l n="85">δῃώσει δόρυ τοῦτο, καὶ Ἰνδοφόνον μετὰ νίκην</l>
<l n="86">Δηριάδην θεράποντα Διωνύσοιο τελέσσω·</l>
<l n="87">παρθενικὴ δʼ ἀνάεδνος ἑὴν λύσειε κορείην,</l>
<l n="90">δεχνυμένη Σατύροιο δασυστέρνους ὑμεναίους, </l>
<l n="91">Ἰνδὴ Μυγδονίοιο μιαινομένη σχεδόν Ἕρμου.’</l>
<l n="92">ὣς φαμένου κεχόλωτο Κορύμβασος, ὀψιμόθου δὲ</l>
<l n="93">φθεγγομένου Κλυτίοιο διέθρισεν ἀνθερεῶνα·</l>
<l n="94">καὶ κεφαλὴ πεπότητο μετάρσιος ἅλματι Μοίρης,</l>
<l n="93">αἱμαλέῃ ῥαθάμιγγι περιρραίνουσα κονίην. </l>
<l n="94">καὶ νέκυν ὀρχηστῆρα παλινδίνητον ἐάσας</l>
<l n="95">Σειληνοὺς ἐφόβησε Κορύμβασος, ἔξοχος Ἰνδῶν,</l>

@whoopsedesy
Copy link
Collaborator Author

whoopsedesy commented Jan 2, 2023

32.90 is another extraneous empty line. The lines 85–90 are out of ordered and manually numbered, but it a <lb rend="displayNum" n="90" /> was inserted somehow as well, effectively creating a blank line numbered 90 in the middle. Fixed in f962267.

<lb rend="displayNum" n="84" />gai=a de\ khw/essan a)naptu/casa loxei/hn
<lb rend="displayNum" n="85" />a)/nqesin i(mertoi=si gamh/lion e)/stefen eu)nh/n:
<lb rend="displayNum" n="86" />kai\ kro/kos e)bla/sthse *ki/lic kai\ e)fu/eto mi=lac,
<lb rend="displayNum" n="88" />qh/lei+ d' a)/rsena fu/lla sune/pleke gei/toni poi/h|,
<lb rend="displayNum" n="89" /><lb rend="displayNum" n="90" />oi(=a po/qou pnei/wn kai\ e)n a)/nqesin a(bro\s a)koi/ths,
<lb rend="displayNum" n="87" />kai\ le/xos a)mfote/rwn e)peko/smee diplo/os o)/rphc,
<lb rend="displayNum" n="90" />*zh=na kro/kw| puka/sas kai\ mi/laki su/ggamon *(/hrhn:
<lb rend="displayNum" n="91" />kai\ *dio\s o)cu\n e)/rwta noh/moni dei/knue sigh=|

https://archive.org/details/dionysiaca02nonnuoft/page/450/mode/1up
dionysiaca02nonnuoft_0466

Perseus 5.0 has the same error:
https://github.com/PerseusDL/canonical-greekLit/blob/812f91f083f88cf789b37be89b22606ca6f27f6c/data/tlg2045/tlg001/tlg2045.tlg001.perseus-grc1.xml

<l n="84">γαῖα δὲ κηώεσσαν ἀναπτύξασα λοχείην</l>
<l n="85">ἄνθεσιν ἱμερτοῖσι γαμήλιον ἔστεφεν εὐνήν· </l>
<l n="86">καὶ κρόκος ἐβλάστησε Κίλιξ καὶ ἐφύετο μῖλαξ, </l>
<l n="88">θήλεϊ δʼ ἄρσενα φύλλα συνέπλεκε γείτονι ποίῃ, </l>
<l n="89"/>
<l n="90">οἷα πόθου πνείων καὶ ἐν ἄνθεσιν ἁβρὸς ἀκοίτης, </l>
<l n="87">καὶ λέχος ἀμφοτέρων ἐπεκόσμεε διπλόος ὄρπηξ, </l>
<l n="90">Ζῆνα κρόκῳ πυκάσας καὶ μίλακι σύγγαμον Ἥρην· </l>
<l n="91">καὶ Διὸς ὀξὺν ἔρωτα νοήμονι δείκνυε σιγῇ</l>

@whoopsedesy
Copy link
Collaborator Author

whoopsedesy commented Jan 2, 2023

The duplicate 40.566 looks like it's just an error, perhaps caused by confusion around out-of-order line numbers and a typographically split line. The second instance of 40.566 in the TEI should have been 40.567 to match the printed version. Fixed in 7574d19.

https://archive.org/details/dionysiaca03nonnuoft/page/192/mode/1up
dionysiaca03nonnuoft_0208

Perseus 5.0 has the same error:
https://github.com/PerseusDL/canonical-greekLit/blob/812f91f083f88cf789b37be89b22606ca6f27f6c/data/tlg2045/tlg001/tlg2045.tlg001.perseus-grc1.xml

<l n="564">Ἄρτεμις οὐ βλάστησεν ἀφʼ ὕδατος, ὡς Ἀφροδίτη. </l>
<l n="566">ἔννεπε Καλλιρόῃ· Δροσερῇ μὴ κρύπτε καὶ αὐτῇ. </l>
<l n="565">Κύπριδι μᾶλλον ὄφελλες ἄγειν χάριν, ὅττι καὶ αὐτὴ </l>
<l n="566">αὐχένα κάμψεν Ἔρωτι, καὶ εἰ τροφός ἐστιν Ἐρώτων.</l>
<l n="567">δέχνυσο κέντρα πόθοιο, καὶ ὑγρονόμον σε καλέσσω</l>

Aside, I'll note that the typographic split of 40.567 is not represented in the TEI, as the one at 37.625 is.

whoopsedesy pushed a commit that referenced this issue Jan 2, 2023
Caused by an apparently accidentally added line break. Compare to a book
scan:
https://archive.org/details/dionysiaca02nonnuoft/page/450/mode/1up

#79
whoopsedesy pushed a commit that referenced this issue Jan 2, 2023
Comparing to a book scan,
https://archive.org/details/dionysiaca03nonnuoft/page/192/mode/1up
the second instance of line number 566 should have been 567 instead.

Perhaps complicated by line 567 being typographically split in the
printed version. Though unlike 37.625, here the markup does not indicate
the typographical break.

#79
@whoopsedesy
Copy link
Collaborator Author

whoopsedesy commented Jan 2, 2023

The cause of duplicate 37.625 is clear: it's a typographically split line at 37.621 whose two parts were wrongly given separate line numbers 621 and 622. What's called "622" should be the remainder of 621, and the following lines up to 625 shifted down a place.

There are two options for resolving it. @sasansom please advise.

  1. Keep the line break and number both partial lines 621. (As in Add metrical line numbers to each part of a typographically split line #27.)
    <lb rend="displayNum" n="621" />w(\s fame/nou
    <lb rend="displayNum" n="621" />*diktai=os e)qh/mona gou/nata pa/llwn...
    
  2. Remove the line break and make it one line in the TEI. (As with 40.567.)
    <lb rend="displayNum" n="621" />w(\s fame/nou *diktai=os e)qh/mona gou/nata pa/llwn...
    

In either case, we will also need a change in src/known.py. Currently it has the first part of 37.621 but not the second:

    'ὣς φαμένου': # Dion. 37.621
        (('ὣς', '+'), ('φαμένου', '--+')),

This is causing the second part to be scanned as a four-word all-spondee line:

Dion.,37,621,1,ὣς,ὡς,1,–,manual,1,ὣς φαμένου
Dion.,37,621,2,φαμένου,φημί,2,⏑⏑–,manual,1,ὣς φαμένου
Dion.,37,622,1,δικταῖος,δικταῖος,1,–––,auto,1,Δικταῖος ἐθήμονα γούνατα πάλλων...
Dion.,37,622,2,ἐθήμονα,ἐθήμων,4,––––,auto,1,Δικταῖος ἐθήμονα γούνατα πάλλων...
Dion.,37,622,3,γούνατα,γόνυ,8,–––,auto,1,Δικταῖος ἐθήμονα γούνατα πάλλων...
Dion.,37,622,4,πάλλων,πάλλω,11,––,auto,1,Δικταῖος ἐθήμονα γούνατα πάλλων...

With option (1), we'll need the second part of 37.621 to be added to src/known.py. With option (2), perhaps automatic scansion, having the whole line to work with, will get it right and we can remove the entry from src/known.py.

https://archive.org/details/dionysiaca03nonnuoft/page/78/mode/1up
dionysiaca03nonnuoft_0094

https://archive.org/details/dionysiaca03nonnuoft/page/80/mode/1up
dionysiaca03nonnuoft_0096

Perseus 5.0 has the same error:

https://github.com/PerseusDL/canonical-greekLit/blob/812f91f083f88cf789b37be89b22606ca6f27f6c/data/tlg2045/tlg001/tlg2045.tlg001.perseus-grc1.xml

<l n="621">ὣς φαμένου</l>
<l n="622">Δικταῖος ἐθήμονα γούνατα πάλλων...</l>
<pb id="v.3.p.80"/>
<l n="623">τῷ δʼ ἐπὶ ποικιλόμητις ἀνέδραμεν ὠκὺς Ἐρεχθεύς,</l>
<l n="624">Παλλάδι Νικαίῃ μεμελημένος, αὐτὰρ ἐπʼ αὐτῷ</l>
<l n="625">Πρίασος ὠκυπόδης, Κυβεληίδος ἀστὸς ἀρούρης.</l>
<l n="625">τοῖσι μὲν ἐκ βαλβῖδος ἔην δρόμος· Ὠκύθοος δὲ </l>

whoopsedesy pushed a commit that referenced this issue Jan 2, 2023
The consecutive lines that were formerly numbered 380 and 382 were
united into one line numbered 380 in commit
2b4737c; however the lines following
were not adjusted to match, which later led to
	warning: Dion.: duplicate line '39.385'

This commit adjust the numbers 383, 384, 385 (first instance) to be 382,
383, 384.

Note, however, that this disagrees with a book scan, in which the the
second half of what here is a united line 380 is numbered 382 (may be an
error in the book).

#79
whoopsedesy pushed a commit that referenced this issue Jan 2, 2023
The consecutive lines that were formerly numbered 380 and 382 were
united into one line numbered 380 in commit
2b4737c; however the lines following
were not adjusted to match, which later led to
	warning: Dion.: duplicate line '39.385'

This commit adjust the numbers 383, 384, 385 (first instance) to be 382,
383, 384.

Note, however, that this disagrees with a book scan, in which the the
second half of what here is a united line 380 is numbered 382 (may be an
error in the book).
https://archive.org/details/dionysiaca03nonnuoft/page/150/mode/1up

#79
@whoopsedesy
Copy link
Collaborator Author

39.385 is fixed in commit 57971eb, but it deserves more comment.

The cause appears to be the same as with 37.625, an earlier line that is typographically split. The two consecutive lines that have printed markers 380 and 382:

καὶ Ζέφυρος κεκόρυστο,
Νότος δʼ ἐπεσύρισεν Εὔρῳ,

are supposed to be only one metrical line.

The two lines were united in commit 2b4737c#diff-c4e168f0263854513ee388716b85afe7d1dc35e0efe75cf605a6f3d4e179d920L17325-R17324; however, the numbering of the following lines were not appropriately adjusted (shifting 383 down to 382, 384 down to 383, 385 down to 384), which is why there was still a duplicate line warning. I guess we are going with the explanation that the printed line number 382 is a misprint that should have been placed one line lower. I made the line number adjustment in 57971eb.

I will note that the printed version has a capital Νότος, while we and Perseus 5.0 have a lower-case νότος. @sasansom, is this an error?

https://archive.org/details/dionysiaca03nonnuoft/page/150/mode/1up
dionysiaca03nonnuoft_0166

The error is still present in Perseus 5.0. Already noted at #53 ("2b4737c nonnusdionysiaca.xml@19397").

https://github.com/PerseusDL/canonical-greekLit/blob/812f91f083f88cf789b37be89b22606ca6f27f6c/data/tlg2045/tlg001/tlg2045.tlg001.perseus-grc1.xml

<l n="379">δυσμενέων ἐθέλοντες ἀιστῶσαι στίχα νηῶν, </l>
<l n="381">οἱ μὲν Δηριαδῆος ἀρηγόνες, οἱ δὲ Λυαίου· </l>
<l n="380">καὶ Ζέφυρος κεκόρυστο, </l>
<l n="382">νότος δʼ ἐπεσύρισεν Εὔρῳ, </l>
<l n="383">καὶ Βορέης Θρήισσαν ἄγων ἀντίπνοον αὔρην</l>
<l n="384">ἄγρια μαινομένης ἐπεμάστιε νῶτα θαλάσσης.</l>
<l n="385">καὶ στόλον ἰθύνουσα μαχήμονα Δηριαδῆος</l>
<l n="385">ὑσμίνης Ἔρις ἦρχε· Διωνύσοιο δὲ νηῶν </l>

@whoopsedesy
Copy link
Collaborator Author

I have fixed the ones that were clear and simple errors. Besides 37.625, which has two possible solutions, @sasansom I need your attention on 37.568 and 44.145.

37.568

I don't see a way to fix 37.568. On page 74, there is a line that follows a printed line number 567 (and is therefore implicitly numbered 568); then on page 78 there is a printed line number 568.

The implicitly numbered 568 precedes a line with printed line number 572; but cannot be that the line is rather supposed to be 571, because there is also another line 571 later on, on page 78.

effective line # printed line # *
565 565 near bottom of page 74
566
567 567
568 *
572 572 top of page 76
573
575 575
600 600 top of page 78
601 601
568 568 *
569 569
570 570
571 571
602 602
603
604
605 605

https://archive.org/details/dionysiaca03nonnuoft/page/74/mode/1up
dionysiaca03nonnuoft_0090

https://archive.org/details/dionysiaca03nonnuoft/page/76/mode/1up
dionysiaca03nonnuoft_0092

https://archive.org/details/dionysiaca03nonnuoft/page/78/mode/1up
dionysiaca03nonnuoft_0094

Perseus 5.0 has the same problem. (It also happens to be missing line 37.600, which I guess is unrelated.)

https://github.com/PerseusDL/canonical-greekLit/blob/812f91f083f88cf789b37be89b22606ca6f27f6c/data/tlg2045/tlg001/tlg2045.tlg001.perseus-grc1.xml

<l n="565">μεσσατίῳ δὲ κάρηνον ἐπηρείδοντο μετώπῳ </l>
<l n="566">ἀκλινέες, νεύοντες ἐπὶ χθονός· ἐκ δὲ μετώπων</l>
<l n="567">θλιβομένων καμάτοιο προάγγελος ἔρρεεν ἱδρώς· </l>
<l n="568">ἀμφοτέρων δʼ ἄρα νῶτα κεκυφότα πήχεος ὁλκῷ</l>
<pb id="v.3.p.76"/>
<l n="572">δίζυγι συμπλεκέος παλάμης ἐτρίβετο δεσμῷ· </l>
<l n="573">σμῶδιξ δʼ αὐτοτέλεστος ἀνέδραμεν αἵματι θερμῷ,</l>
...
<l n="599">καὶ ταχὺς ἀντιβίου τετανυσμένος ὑψόθι νώτων,</l>
<pb id="v.3.p.78"/>
<l n="601">αὐχένι δεσμὸν ἔβαλλε βραχίονι, δάκτυλα κάμψας· </l>
<l n="568">μυδαλέῳ δʼ ἱδρῶτι χυτὴν ἔρραινε κονίην, </l>
<l n="569">αὐχμηρῇ ψαμάθῳ διερὴν ῥαθάμιγγα καθαίρων, </l>
<l n="570">μὴ διολισθήσειε περίπλοκος ἅμματι χειρῶν </l>
<l n="571">θερμὴν τριβομένοιο κατʼ αὐχένος ἰκμάδα πέμπων. </l>
<l n="602">τοῦ δὲ πιεζομένοιο συνέρρεον ὀξέι παλμῷ </l>

44.145

There are 5 unnumbered lines between 140 and 145 in the book scan; then there is no line 147. Perhaps another misprint?

https://archive.org/details/dionysiaca03nonnuoft/page/306/mode/1up
dionysiaca03nonnuoft_0322

https://archive.org/details/dionysiaca03nonnuoft/page/308/mode/1up
dionysiaca03nonnuoft_0324

Perseus 5.0 has the same problem:

https://github.com/PerseusDL/canonical-greekLit/blob/812f91f083f88cf789b37be89b22606ca6f27f6c/data/tlg2045/tlg001/tlg2045.tlg001.perseus-grc1.xml

<l n="140">κύμβαλα δʼ ἠχήεντα διαρρίψαντες ἀήταις </l>
<l n="141">καὶ πάταγον Βερέκυντα καὶ Εὔια τύμπανα Ῥείης</l>
<l n="142">ἕλκετε Βασσαρίδας μανιώδεας, ἕλκετε Βάκχας,</l>
<l n="143">ἀμφιπόλους Βρομίοιο συνήλυδας, ἃς ἐνὶ Θήβῃ</l>
<pb id="v.3.p.308"/>
<l n="144">Ἰσμηνοῦ διεροῖσιν ἀκοντίζοντες ἐναύλοις</l>
<l n="145">Νηίδας Ἀονίαις ποταμηίσι μίξατε Νύμφαις</l>
<l n="145">ἥλικας, Ἁδρυάδας δὲ γέρων δέξαιτο Κιθαιρὼν </l>
<l n="146">ἄλλαις Ἁδρυάδεσσιν ὁμόζυγας ἀντὶ Δυαίου. </l>
<l n="148">ἄξατε πῦρ, θεράποντες, ἐπεὶ ποινήτορι θεσμῷ, </l>
<l n="149">ἐκ πυρὸς εἰ πέλε Βάκχος, ἐγὼ πυρὶ Βάκχον ὀπάσσω·</l>
<l n="150">Ζεὺς Σεμέλην ἐδάμασσεν, ἐγὼ Διόνυσον ὀλέσσω. </l>

@sasansom
Copy link
Owner

sasansom commented Jan 2, 2023

For Dion. 37.621 (discussed here), we should go with option 2. I have fixed this now in commit bc0e28f. Hopefully the automatic scansion will clear things up. I've deleted its entry in known.py (commit 1466dff).

@sasansom
Copy link
Owner

sasansom commented Jan 3, 2023

39.385 is fixed in commit 57971eb, but it deserves more comment.

The cause appears to be the same as with 37.625, an earlier line that is typographically split. The two consecutive lines that have printed markers 380 and 382:

καὶ Ζέφυρος κεκόρυστο,
Νότος δʼ ἐπεσύρισεν Εὔρῳ,

are supposed to be only one metrical line.

The two lines were united in commit 2b4737c#diff-c4e168f0263854513ee388716b85afe7d1dc35e0efe75cf605a6f3d4e179d920L17325-R17324; however, the numbering of the following lines were not appropriately adjusted (shifting 383 down to 382, 384 down to 383, 385 down to 384), which is why there was still a duplicate line warning. I guess we are going with the explanation that the printed line number 382 is a misprint that should have been placed one line lower. I made the line number adjustment in 57971eb.

I will note that the printed version has a capital Νότος, while we and Perseus 5.0 have a lower-case νότος. @sasansom, is this an error?

Yes, I think it's an error. I've now capitalized Notos in commit 55fcfb4.

@sasansom
Copy link
Owner

sasansom commented Jan 3, 2023

I have fixed the ones that were clear and simple errors. Besides 37.625, which has two possible solutions, @sasansom I need your attention on 37.568 and 44.145.

37.568

I don't see a way to fix 37.568. On page 74, there is a line that follows a printed line number 567 (and is therefore implicitly numbered 568); then on page 78 there is a printed line number 568.

The implicitly numbered 568 precedes a line with printed line number 572; but cannot be that the line is rather supposed to be 571, because there is also another line 571 later on, on page 78.

What's going on here is 1) Rouse, the editor of the Loeb edition, chose to transpose lines 37.568–71 to after 37.601 and before 37.602 (p. 78), but 2) at the top of p. 76 the edition incorrectly printed 572. It should have printed 573, or better yet, have printed 572 on the bottom of p. 74 (which I assume was the original intention before typesetting; it also corresponds to 37.572 in Keydell's edition [1959]).

And there is separate problem. The xml file does not include 37.600 anywhere in the text (corresponding to 600 on top of p.78); it seems to have dropped out entirely. I have now added it.

I have fixed both in commit b180f53.

44.145

There are 5 unnumbered lines between 140 and 145 in the book scan; then there is no line 147. Perhaps another misprint?

Here Rouse follows the edition of Ludwich (Teubner 1911) which transposes line 147 between 138 and 139.

Screen Shot 2023-01-02 at 9 19 46 PM

The problem is that the Loeb edition prints the verse numbers incorrectly; the sequence [138, 147, 139, 140] on p. 306 should be shifted down one line.

I've fixed this in commit 7c1e467.

@whoopsedesy
Copy link
Collaborator Author

I've made sure that the commits resulting from this issue are reflected at #53.

whoopsedesy pushed a commit to sasansom/breaking-hermanns-bridge that referenced this issue Jun 1, 2023
The table of line totals formerly hardcoded the WORKS from Table 1 of
"SEDES: Metrical Position in Greek Hexameter". But there have been
changes to the corpus since then that affect line numbering, for example
sasansom/sedes#77
sasansom/sedes#79
sasansom/sedes@04dd4a1

Furthermore, Table 1 from the "SEDES" article is produced using an
xmlstarlet command running on the source TEI directly counting l and lb
elements, not on the derived CSV files. In our notes for the table we
remark that this is because duplicate line numbers cause the counts to
come out too low:

	For future reference:

	$ (echo "work,lines"; for a in corpus/*.xml; do echo "$a,$(xmlstarlet sel -t -m '//l' -v '"l"' -n -t -m '//lb' -v '"lb"' -n "$a" | wc -l)"; done) > corpus.csv
	> x <- read.csv("corpus.csv")
	> sum(x$lines)
	[1] 73098
	> summary(x$lines)
	Min. 1st Qu. Median Mean 3rd Qu. Max.
	479 1017 2434 6092 9628 21356

	---

	Table 1 numbers checked 2022-09-17, sedes commit cf795ef740.

	---

	> x <- bind_rows(map_dfr(Sys.glob("corpus/*.csv"), read_csv, col_types = cols(line_n = col_character(), book_n = col_character())))
	> x %>% group_by(work) %>% summarize(n = n())

	NB the line counts you get from counting distinct line numbers in the CSV are slightly different (smaller) from what you get from xmlstarlet, because of duplicated line numbers.
	> x %>% select(work, book_n, line_n) %>% unique %>% nrow
	[1] 72954
	> x %>% select(work, book_n, line_n) %>% unique %>% group_by(work) %>% summarize(n = n())

In this repository I've started adding a workaround for the duplicate
line numbers, counting up a line whenever word_n fails to increase with
the same work, book_n, and line_n in input order. But even with that,
the automatically determined counts for Callim.Hymn and Q.S. are 1
smaller than they used to be, and unlike Dion. and Theoc., we have not
made changes to those texts that should affect line count totals. I am
planning to look at those more closely, but for now, go ahead with the
automatically computed line numbers, because that's what all our
percentages etc. are based on.

If I repeat the xmlstarlet calculation with current SEDES files
(605a27b3af22089379aad22ba96edf113970a7b0), the only change I get is 3
fewer lines in Dion. Using the automatically determined line numbers
takes it down another 99 lines across 4 works.

   work        old_num_lines redo_old_num_lines diff1 new_num_lines diff2
   <chr>               <dbl>              <dbl> <dbl>         <int> <dbl>
 1 Phaen.               1155               1155     0          1155     0
 2 Argon.               5834               5834     0          5834     0
 3 Callim.Hymn           941                941     0           940    -1
 4 Hom.Hymn             2342               2342     0          2342     0
 5 Il.                 15683              15683     0         15683     0
 6 Dion.               21356              21353    -3         21259   -97
 7 Od.                 12107              12107     0         12107     0
 8 Q.S.                 8801               8801     0          8800    -1
 9 Sh.                   479                479     0           479     0
10 Theoc.               2527               2527     0          2524    -3
11 Theog.               1042               1042     0          1042     0
12 W.D.                  831                831     0           831     0
13 total               73098              73095    -3         72996  -102
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants