Page ranges with letters #25

jgm · 2020-06-12T17:57:15Z

The spec doesn't make clear what to do when page ranges have letters, but there are tests like this. Some of the expected results seem wrong to me, though.

E.g. in page_Minimal.txt, the second to last example

n11564 - n1568

turns into

n11564–8

Why? That makes no sense to me. I am inferring that the algorithm is:

strip off the common letter content
rangeify the numbers as you'd normally do

Is that the algorithm, and is that really how it should work here?
Similar questions apply to some of the weird Chicago cases.

The text was updated successfully, but these errors were encountered:

bdarcus · 2020-06-12T22:53:56Z

I don't have my copy of chicago handy. Anyone else know? Beyond confirming the tests, looks like we need to add something on this to the spec.

@bwiernik @denismaier @fbennett @adam3smith

adam3smith · 2020-06-13T01:10:30Z

I think the logic makes some sense: At times, page number have a prefix that identifies them as pages, e.g., in a separately published appendix. In those cases, there's no reason to think that standard page number collapsing rules shouldn't apply, i.e. if you're citing A231 to A232, A231-32 (Chicago) or A231-2 (minimal) does seem to make sense and increase readability.

The downside I'm seeing (and I think that may have come up before) is that you may see hyphens in electronic article numbers and then this rule can produce bizarre outcomes. I think by testing for identical prefixes it tries to prevent this.

The Chicago Manual has nothing to say on this or any of the examples in the Chicago weird tests, so no help there ;). However, Citing Medicine, (aka Vancouver) which uses minimal page ranges does have a number of relevant examples and unless I'm misreading something, they confirm the test-suite's behavior: https://www.ncbi.nlm.nih.gov/books/NBK7282/#A32739

Siedenburg J, Perry I, Stuben U. Tropical medicine and travel medicine: medical advice for aviation medical examiners concerning flight operations in tropical areas. Aviat Space Environ Med. 2005 Mar;76(3 Suppl):A1-30.

Barrett CJ, Malpas SC. Problems, possibilities, and pitfalls in studying the arterial baroreflexes' influence over long-term control of blood pressure. Am J Physiol Regul Integr Comp Physiol. 2005 Apr;288(4):R837-45.

Guo X, Lu X, Kassab GS. Transmural strain distribution in the blood vessel wall. Am J Physiol Heart Circ Physiol. 2005 Feb;288(2):H881-6.

jgm · 2020-06-13T05:04:23Z

It would be helpful if the test suite could have a field that says whether the test really tests spec behavior or just some additional behavior that citeproc-js implements but isn't part of the spec.

jgm · 2020-06-13T05:05:17Z

Anyway, thank for explaining the logic. I think it makes sense, and I'm happy to close this!

jgm · 2020-06-13T05:13:17Z

By the way, the reason I have all these questions is that I'm writing a new Haskell CSL processing library. The legacy code in pandoc-citeproc (inherited from citeproc-hs) is really hairy and I can't understand it well enough to maintain it; in addition, I never really understood CSL, and this is forcing me to learn it. The new library will be faster and more accurate than pandoc-citeproc, and it is parameterized on a document type, so it should be easy to use outside fo the pandoc ecosystem. If quality is high enough I might make it a dependency of pandoc so a filter isn't needed. Just about everything is implemented now except disambiguation and collapsing. I'm sure I'll have more questions as I go along, and I'll put it in a public repository once it gets a bit closer.

denismaier · 2020-06-13T07:01:07Z

By the way, the reason I have all these questions is that I'm writing a new Haskell CSL processing library. The legacy code in pandoc-citeproc (inherited from citeproc-hs) is really hairy and I can't understand it well enough to maintain it; in addition, I never really understood CSL, and this is forcing me to learn it. The new library will be faster and more accurate than pandoc-citeproc, and it is parameterized on a document type, so it should be easy to use outside fo the pandoc ecosystem. If quality is high enough I might make it a dependency of pandoc so a filter isn't needed. Just about everything is implemented now except disambiguation and collapsing. I'm sure I'll have more questions as I go along, and I'll put it in a public repository once it gets a bit closer.

Wow, very cool.
In case there is something a non-haskeller can help with (e.g. testing or so), just give a shout.

bdarcus · 2020-06-13T08:54:38Z

It would be helpful if the test suite could have a field that says whether the test really tests spec behavior or just some additional behavior that citeproc-js implements but isn't part of the spec.

This is related to #17, so I strongly agree.

Identifying these would also give us a checklist of details that we should add to the spec.

As you work through these, could you perhaps post a list of tests you think might qualify, beyond this one?

Also, do you have in mind what the content of that field should be?

I experimented a bit with just adding this to page_Minimal, and the current python script just ignores it; would of course be easy to extend though.

>>===== VERSION =====>>
1.0:citeproc-js
<<===== VERSION =====<<

So possible values would be the releases ("1.0", "1.1"), with an optional variant, including maybe "undocumented" (to flag what needs updating in the spec)?

EDIT: the linked PR adds version and tags field parsing to the processor script, and adds the former to all tests in the repo.

bdarcus · 2020-06-13T09:08:38Z

By the way, the reason I have all these questions is that I'm writing a new Haskell CSL processing library.

Since you're working on this (great!), the activity on the schema repo is aimed towards pushing out two releases this Summer, one of them a 1.1 release (we're also doing a minor release with new strings (types, variables, etc.).

So far 1.1. doesn't have any breaking changes, but it does make explicit, and extends, a feature you already support in pandoc: citet citation config. The new element allows us to fully support styles like APA on this.

jgm · 2020-06-13T17:49:55Z

As you work through these, could you perhaps post a list of tests you think might qualify, beyond this one?

I'm sorry, I should have been keeping track. Just a couple things I can recall off the top of my head:

the test cases expect a conversion of unicode superscripted letters to csl <sup> elements
the list of "stop words" for title case in the spec do not seem to be complete; I had to add: "about", plus name particles "van", "von", "de", "d", "l".
in titlecase, non-English characters don't get uppercased, even in an English locale + citation; this isn't in the spec
in general, the spec is confusing about titlecase. It seems that things in ALL UPPERCASE don't get transformed, even though the spec says "For uppercase strings, the first character of each word remains capitalized. All other letters are lowercased." Maybe by "uppercase" the spec means just that the first letter is uppercase and the rest lowercase, but if so that's incredibly confusing, esp. with the subsequnet use of "mixed case."
the spec says that the label element in names "must be included after the cs:name and cs:et-al elements, but before the cs:substitute element." This is false; it can come either before or after the name element. Moreover, its position relative to the name element turns out to be significant (it determines whether the label is printed before or after the name). I had to discover this by experimentation.

Partially addresses #17 and #25, this adds a "VERSION" field to the processor.py script. Syntax for the field value is: [version]:[tag].

fbennett · 2020-06-13T22:53:47Z

Just about everything is implemented now except disambiguation and collapsing.

HI John. The specifics of the code won't be much use, but the disambiguation control loop of citeproc-js might be worth a quick look, if only for the sequence of operations. Apart from a few minor bugfixes, the code in there hasn't changed in six years of operation.

The specific code in the module re-renders citations for comparison, because the intermediate form of the citation is a bundle of strings with no pointers or links to the original input. If I understood Cormac's notes correctly, the intermediate form in citeproc-rs retains that connection, and so can perform disambiguation more efficiently, without proceeding to final rendering (using methods that I won't pretend to understand).

jgm · 2020-06-14T17:18:11Z

Thanks for the pointer. Disambiguation feels like the biggest hill to climb!

jgm closed this as completed Jun 13, 2020

bdarcus mentioned this issue Jun 13, 2020

missing spec details, errors citation-style-language/documentation#88

Open

bdarcus added a commit that referenced this issue Jun 13, 2020

Add version field to processor.py

9c89faa

Partially addresses #17 and #25, this adds a "VERSION" field to the processor.py script. Syntax for the field value is: [version]:[tag].

bdarcus mentioned this issue Jun 13, 2020

Add version, tags fields to script, tests #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page ranges with letters #25

Page ranges with letters #25

jgm commented Jun 12, 2020

bdarcus commented Jun 12, 2020

adam3smith commented Jun 13, 2020

jgm commented Jun 13, 2020

jgm commented Jun 13, 2020

jgm commented Jun 13, 2020

denismaier commented Jun 13, 2020

bdarcus commented Jun 13, 2020 •

edited

Loading

bdarcus commented Jun 13, 2020

jgm commented Jun 13, 2020

fbennett commented Jun 13, 2020

jgm commented Jun 14, 2020

Page ranges with letters #25

Page ranges with letters #25

Comments

jgm commented Jun 12, 2020

bdarcus commented Jun 12, 2020

adam3smith commented Jun 13, 2020

jgm commented Jun 13, 2020

jgm commented Jun 13, 2020

jgm commented Jun 13, 2020

denismaier commented Jun 13, 2020

bdarcus commented Jun 13, 2020 • edited Loading

bdarcus commented Jun 13, 2020

jgm commented Jun 13, 2020

fbennett commented Jun 13, 2020

jgm commented Jun 14, 2020

bdarcus commented Jun 13, 2020 •

edited

Loading