-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we add translated-title
and transliterated-title
to the objectified title?
#327
Comments
@fbennett - you know more about this area than us. Thoughts? |
There are various transliteration schemes (roman, cyrillic, many others), and some languages require a native transliteration for basic sorting (hiragana in Japanese, Taiwan sorts Unicode glyphs directly, but I'm not sure what they do in the PRC). So you need to provide for multiple transliterations, and key them by ID. BCP 47/RFC 5646 is a robust scheme covering pretty much everything. It can be validated loosely (regexp-wise) or tightly (by extending a regexp-wise scheme with a controlled list of allowed values---here's the raw data of the registry, batteries not included). Translation may be into multiple languages also, so the same applies there. Also, the language of (what I call) the "headline" field may differ from that of the item. In a scheme that pre-parses titles into main- sub- and short- elements, there would be a design decision over whether to apply a headline-field language to the entire set, or to the individual elements separately. Also, there would be an issue over whether to require full-parallel translation/transliteration of all sub-fields within the structured title (i.e. whether to allow a French version of the main title without also requiring a French version of the subtitle, and so forth). Whatever scheme is applied to a structured title should also be available on some text fields, and on creator fields with their different structure. A "CSL JSON" export from Jurism would show the structures I've come up with over there, for what that's worth. |
@fbennett Can you post a Jurism CSL JSON export with some translated/transliterated fields or point to a test with some? I'm not familiar enough with the Jurism GUI to make such an item quickly. Also @fbennett could you explain what the On the one hand, a single On the other hand, implementing translated and transliterated forms of all fields allows for more robust multilingual support and handling of things like native Japanese sorting by hiragana. If we did something like this, I would suggest we make specific |
For Frank's question of language codes applied to sub-elements of a title, I think we could adopt a general inheritance mechanism. If a subfield lacks a language code, it inherits from the parent field; if a field lacks a language code, it inherits from the item. |
Here is the data behind this citation:
[
{
"type": "book",
"multi": {
"main": {
"event-place": "ja",
"publisher": "ja",
"publisher-place": "ja",
"title": "ja"
},
"_keys": {
"event-place": {
"en": "Tokyo",
"ja-alalc97": "Tōkyō",
"ja-Hira": "とうきょう"
},
"publisher": {
"en": "Yuhikaku Publications",
"ja-alalc97": "Yūhikaku shobō",
"ja-Hira": "ゆうひかくしょぼう"
},
"publisher-place": {
"en": "Tokyo",
"ja-alalc97": "Tōkyō",
"ja-Hira": "とうきょう"
},
"title": {
"en": "Commentary on the Civil Code",
"ja-alalc97": "Minpō yōgi",
"ja-Hira": "みんぽうようぎ"
}
}
},
"event-place": "東京",
"language": "ja",
"number-of-volumes": "5",
"publisher": "有斐閣書房",
"publisher-place": "東京",
"title": "民法要義",
"author": [
{
"family": "梅",
"given": "謙次郎",
"multi": {
"_key": {
"en": {
"family": "Ume",
"given": "Kenjiro"
},
"ja-Hira": {
"family": "うめ",
"given": "けんじろう"
},
"ja-alalc97": {
"family": "Ume",
"given": "Kenjirō"
}
},
"main": "ja"
}
}
],
"issued": {
"date-parts": [
[
"1898"
]
]
}
}
] |
Jurism recognizes a vector in the Language field:
In those variables, the language code is mapped to the (English) name of the respective languages. |
Thanks @fbennett
So this is for rendering the language in citations? So to say “In English” or “Translated from Japanese”? |
Yes, the variables are available in citations. We've used them for translated legal documents in theses, where the original has been destroyed or is no longer available. |
If I remember correctly, this can also be used for conditional rendering based on the language of the current document. Like, your item is |
That sounds like a good approach. Perhaps you could elaborate a bit more how you think this might work? Some time ago, @cormacrelf envisioned introducing syntax for enabling/disabling certain features or sets of features: https://discourse.citationstyles.org/t/csl-1-2-planning/1476/7 |
@fbennett Why is it that you have two |
We could either adopt the current CSLm JSON or simplify a bit to something like:
Something like this has been on the table anyway, see https://juris-m.readthedocs.io/en/latest/dev-sync-simplification.html
Strictly speaking,
So, this will instruct citeproc-js to use the original variables for all types of variables, but for title variables it will also use the translated variant. |
I don't think we should do full ML anytime soon; certainly not for 1.1.
Really the question I had for 1.1 is if we move variables like translated-title to the title object, without otherwise modifying them.
|
Agreed. There are three factors I'm considering.
With these considerations, making If we did move |
I don't think we should adopt this. With the |
We can discuss multilingual data structures in another thread, but my inclination would be for all of this to occur at the field-level. So, any field might be object with
|
My impulse is we should do this. The only reason I think not to is if it presented some future barrier to fuller ML support. @denismaier - thoughts?
Maybe take this comment and turn it into an issue ("reference in new issue"), for future reference? |
I think it would be the opposite; doing it would make fuller ML support easier.
Cool! Didn't know that button existed. |
So are we talking a PR with this: title:
translated: foo
main: bar .. or this? title:
translated:
main: foo
main: bar I guess the latter? And then remove the Maybe, per @denismaier's initial impulse, we call it That would give more future flexibility, should we possibly need it. |
The second option. |
The aim was (and is) to maintain compatibility with CSL-JSON to the extent possible. Ordinary fields are strings, so it's not possible to give them a sub-field without changing the data type. Creator variables are already objects, so a subfield can be added without changing the data type; and since creator fields are dynamic, it makes sense to tie the variants to each name instance---and CSLm-JSON just reflects that structure, which keeps exports simple. |
Yes, we should add
In terms of structure, it should mirror the standard structure of title variables, so:
Such a structure would be extensible if need arises. We can add language variables, type variables to indicate if the alternate is a translation or a transliteration, and convert |
I don’t think a type is necessary. That will be clear from the language code (as in the CSLm JSON example above). |
I'm agnostic.
This attribute is broader, and it's values would be things like "translated." So I don't think they need to mirror each other; do they? |
Yes, it's broader. I just wanted to point out that it shouldn't just be a flat string, but have distinct properties for title parts. |
No, I don't think the values should be "translated", etc. We should go one of three ways.
Of these, 1 and 3 are compatible with each other. We could do 1 now, but then easily add 3 as an option in a future version or in a multilingual extension. |
My thinking is that we should make a solution that flows easily into having multiple alternates for multilingual support (or even just picking a translation based on the document locale). We could even fairly easily do (3) in v1.1 without the expectation of full ML support by:
That honestly might be the most straightforward approach. |
How so? You mean by virtue of it being under an |
I guess "translation or transliteration" means two different things in that sentence... Certain language codes refer to transliterations: e.g. he-alalc97 transliterated according to the Library of Congress Romanization rules. |
The BCP 47/RFC 5646 scheme Frank linked to defines languages codes unambiguously not only for languages/locales but also for scripts and the like. It's summarized here. The basic structure is For example, if an item with |
Put generally, "different language" = translation, "same language, different script" = transliteration. |
So 3. would come down to this:
|
Transliteration is not the issue; I should have made clear I was asking about the translation part. How do you distinguish the original and translated title? |
Close, I was thinking this:
|
The |
Ok. Any serious reasons not to go with 3 now? |
If your question is: "I am citing an English translation of a Spanish book. How do I refer to the original Spanish title?" That would be stored in |
I was thinking language would be an inheritable property, right? If so, this option would allow for something like this: language: en
title:
main: A title in English
sub: with a subtitle
language-alternate:
- lang: de
main: An alternate title in German
sub: with a subtitle
container-title:
language: fr
main: A title in French |
I really like that approach, and I think we should adopt this, unless there are serious drawbacks to this. But, if we adopt this, we'll also have to figure out how these language alternates will be accessible in styles. |
How about a simpler syntax— |
Can you switch between transliterations and translations with that approach? |
We could have Transliterations are a bit more involved a question. For example, publications in Latin-script languages often want to print the transliteration instead of the original script version of a title (e.g., APA calls for transliteration "if possible or advisable"). So, we might want to offer a style-level option to substitute transliterations if available. |
Ok.
Jurism currently let's you cite a combination of the title in the original script and title, a transliteration, and a translation. I was aiming for something similar in the other thread. Style-level attributes could work. (But I imagine a |
Implementation details aside: Do we have a consensus that |
@bwiernik did you by chance notice the BCP47 -t- element? it is for marking content as transformed, such as in transliterations. Here is the RFC: https://tools.ietf.org/html/rfc6497 |
@HughP Okay, that's interesting. That could potentially save the need to have distinct Edit: Though thinking about it, we wouldn't necessarily need to use
But we could still rely on the locale definitions to indicate whether a field is a transliteration or translation. |
Yes, that may be true. I think the discussion on this thread was debating the architecture of a set of key-value pairs. In the proposals I see, there is a hierarchical distinction between source language and target language so these functions can always be distinguished via the hierarchy. Though perhaps for human readability there might be some utility in using |
It might nevertheless be useful once we go beyond the current model where each item has exactly one main language for the item as a whole. E.g., |
As a follow-up on converting titles into objects, I think we should discuss whether there is any value in adding alternate forms (translated or transliterated title forms) to these title objects. Maybe so:
Or just so:
The text was updated successfully, but these errors were encountered: