Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ISOCat reference to DatCatInfo #2227

Closed
JanelleJenstad opened this issue Feb 7, 2022 · 23 comments · Fixed by #2359
Closed

Update ISOCat reference to DatCatInfo #2227

JanelleJenstad opened this issue Feb 7, 2022 · 23 comments · Fixed by #2359

Comments

@JanelleJenstad
Copy link
Contributor

From Roberto Rosselli Del Turco via TEI-L (2022-02-02): "in the Guidelines section devoted to Dictionaries there's one reference to the ISOCat standard, but the latter has been superseded by DatCatInfo: http://www.datcatinfo.net/ (with a more precise URL of course) instead of http://www.isocat.org/ in https://tei-c.org/release/doc/tei-p5-doc/en/html/DI.html#index-egXML-d52e79215."

@JanelleJenstad JanelleJenstad self-assigned this Feb 7, 2022
@bansp
Copy link
Member

bansp commented Feb 7, 2022

I can try to take this one, because it requires a bit of crafting. The new standard defines a bit different entity with a bit different role, so a direct substitution might not be the best way to handle that. But I've just updated an ISO standard wrt the references to the dead ISOCat (even Heisenberg wouldn't be uncertain, here, sadly), so I think I can handle this one as well. Not sure if I don't actually have that assignment on a very old plate, somewhere among the grey TEI issues -- gonna have a look now.

@bansp
Copy link
Member

bansp commented Feb 7, 2022

OMG, look at that... #232, #1089, #1866 . It's time... :-)

@martinascholger
Copy link
Member

Hm, I thought I had already implemented #1866. I will investigate why I did not.

@bansp
Copy link
Member

bansp commented Feb 7, 2022

Oops, the middle issue that I listed above is actually closed. Well, @martinascholger , please ping me if you decide that I can be of use. My instinct would be to look at all the text fragments that mention the mechanism, to remove the recommendation to use ISOCat and definitely not replace it with a recommendation of the privately-owned datcat, but rather generalise. The original 'mistake' was to recommend ISOCat as if it were the only such service, while it should have simply be treated as an example of an external reference taxonomy. Its use goes beyond language-related applications, too.

@JanelleJenstad
Copy link
Contributor Author

JanelleJenstad commented Apr 6, 2022

@bansp: Can you recommend some wording that would allow us to indicate the need to refer to a standard without mentioning datcat?

Here are the sections of the Guidelines that currently contain references to ISOcat:

Attribute class: att.datcat

Element spec: <gram>

Example in 18.3

Text of 18.3: "Whether at the level of feature-system declarations, feature- and feature-value libraries, or individual features, it is possible to align both feature names and their values with standardized external data category repositories such as ISOcat. In the following example, both the feature part_of_speech and its value #commonNoun are aligned with the respective definitions provided by [ISO DCR (Data Category Registry)], as implemented by ISOcat."

Note 82 in 18.3

Text of 9.5.2: "The TEI provides means to align grammatical categories as well as their content with the ISOcat reference, which is a Web implementation of [ISO 12620]. / In the example below, a fragment of the entry for isotope cited in section 9.3.2 Grammatical Information is adorned by references to ISOcat definitions for "part of speech" (dcr:datcat) and "adjective" (dcr:valueDatcat). Depending on the status and extent of the dictionary, various strategies may be used to reduce the redundancy of the repeated ISOcat references."

Example in 9.5.2

@bansp
Copy link
Member

bansp commented Apr 7, 2022

I can submit a PR some time after this week. Will try to keep this in sights. Cheers!

@JanelleJenstad
Copy link
Contributor Author

Thanks @bansp! We refrigerate this weekend. We can sneak these changes into the upcoming release if you are able to do them early next week.

@bansp
Copy link
Member

bansp commented Apr 7, 2022

I'd love to say "challenge accepted", but I am unable to make promises at this point. Will try my best though, knowing the stakes. :-)

@ebeshero
Copy link
Member

Council F2F: @bansp We're approaching another release in October, so we're hoping perhaps to fix this by then. Can you help?

@bansp
Copy link
Member

bansp commented Sep 12, 2022

Heck, yes, @ebeshero and thanks for the ping -- I'll handle this after my Wednesday presentation and before Saturday morning.
Funny that I thought about that issue (and a few other of my old promises) maybe two minutes before seeing your message.

@martindholmes
Copy link
Contributor

@bansp Can we discuss this at the Ling SIG this afternoon? Seems like an appropriate venue since this is a matter for linguists.

@martindholmes
Copy link
Contributor

At Ling SIG, @bansp gave us a really helpful overview of the situation here and we think every current reference in the Guidelines to ISOCat should be replaced by a generic recommendation to point to a data category repository which ideally conforms with ISO 12620 if appropriate.

@martindholmes
Copy link
Contributor

Following discussion with @peterstadler and @bansp: @bansp will revise the spec page for att.datcat to serve as an example, and make a pull request; then Council can track down and revise all other mentions and invocations of ISOCat throughout the Guidelines and fix them following @bansp's example.

@bansp
Copy link
Member

bansp commented Sep 16, 2022

While I'm nibbling on this, a sub-issue struck me, namely the matter of the dcr namespace, which is "http://www.isocat.org/ns/dcr" (recall that it's dcr:datcat).
The namespace can be taken as an URI, whether it resolves to anything in the future (atm it doesn't) or not. But it's not possible to predict the authority that is going to control it in the future, unless someone (CLARIN, TEI-C) were to purchase the domain. I don't think either will care to. And we could ignore the authority, just as we could ignore the dead link, but we know that some users are going to keep asking about that.

One solution could be to ask the CLARIN Standards Committee to assign a "clarin.eu"-based namespace URI for the dcr: prefix. But this solution, from the point of view of the TEI is not necessarily optimal, because it still delegates the authority to an external party. The CSC maintains two such URIs at the moment for ISO, and you can sense that there is a certain degree of randomness in this: https://www.clarin.eu/content/standards#namespace-assignment .

Another solution, which at this point seems to me pretty optimal (under the circumstances), is to deprecate the dcr prefix and to place the two attributes directly in the TEI namespace. That is because the presence of the prefix was justified by the fact that the old DCR standard that used ISOCat explicitly defined these attributes. However, the new DCR standard ("ISO 12620-1, Management of terminology resources - Data categories - Part 1: Specifications") does not even mention the namespace bound to the dcr prefix, and if it were to mention it (which I don't think is going to happen in the foreseeable future), it wouldn't use the string "isocat" at the basis, because the TC37 secretariat is oversensitive to the substring "iso" and will not approve of "isocat" for sure. So it would use something else (but it won't be anything, I bet). The standard mentions the dcr:datcat attribute as an example of how the TEI utilizes the attributes (rather than of how these attributes should be used in XML documents). In other words, the standard has dissociated itself from the datcat attributes, merely acknowledging them as a TEI mechanism. ("Convoluted!", you will exclaim. So it is.)

"Oh no," you will go on, "the dcr prefix is mentioned by an ISO standard, so we have to keep it!" -- no worries. The editorial procedure of removing "dcr:" from one example in the standard (and it's only used in one example) is a matter of a microrevision, which the relevant working group will be able to perform in a month or two, if it tries -- and I can inform them of the need, if such a need arises.

I see the deprecation of the prefix as "nativizing" the DCR mechanism by the TEI. If that move were approved by the Council, I would be happy to use the unprefixed attributes in the version of ISO MAF that is about to be submitted for the committee ballot. Perhaps there is a chance for the Council to address this in the time remaining in Newcastle?

@martindholmes
Copy link
Contributor

I agree with removing the prefix and the namespace. That will make the attributes seem more generally applicable for people who want to point at their own data categories. @laurentromary do you have any thoughts on this?

@laurentromary
Copy link
Contributor

Yes, that's a good option!

@ebeshero
Copy link
Member

ebeshero commented Sep 25, 2022

@rettinghaus @bansp @sydb @martindholmes @martinascholger I was working on #2340 about inconsistent ISO referencing, and stumbled into the datcat question on my own. Can we do one of the following here?

@ebeshero
Copy link
Member

ebeshero commented Sep 25, 2022

Reviewing the ticket and recalling conversations, that link update seems precisely what we want to avoid. Sorry for barging in on the back of another ticket! Anyway, I’ll concentrate on the easier updates in #2340 . And await the PR.

@ebeshero
Copy link
Member

ebeshero commented Sep 25, 2022

But I wonder if we can, for the moment, just remove the references to http://isocat.org/ as preparation for the coming PR, since we know we shouldn’t be pointing to it at all. We seem agreed here not to be pointing to a standard, and the idea of “nativizing” data categories (and perhaps other things for which we used to rely on ISO) seems the path for TEI.

@bansp
Copy link
Member

bansp commented Sep 25, 2022

If there is no rush to remove the references to isocat today as opposed to a week ago, may I ask for them to be left in place for now, simply because I'm, going over them all (I know, more than I was asked for, but it's hard to leave them if I can handle them), and I already anticipate some conflicts with my version even before I submit the PR. Not increasing the amount of extra work spent on resolving conflicts would be very welcome, because I'm racing against several clocks (but this item is my priority now).

@ebeshero
Copy link
Member

ebeshero commented Sep 25, 2022

Got it--thank you, @bansp . I won't touch the isocat links and leave this to you. I'd like Council to review my table of proposed ISO citation updates anyway before I do anything more on #2340 . Let us know if we can help with anything on your end!

@bansp
Copy link
Member

bansp commented Oct 18, 2022

The result is in PR #2359 , spread mostly across Specs/att.datcat.xml and the FS and DI chapters, with some little extras.
The description in the att.datcat spec is a bit verbose... but I tried to gather the various ways in which the attributes can be used, with examples. Eliminated references to ISOCat from FS, DI and the <gram> spec. Added some bits to the FS chapter in order for the individual examples to start making sense when combined.
The results can also be seen directly in:

Note: the three above documents invoke the Paderborn version of the TEI schema, with the dcr: namespace eliminated.

I hope the result is acceptable (I still need to check if I have introduced any layout mess in FS; but will catch some sleep first). I am of course willing to work on improving/rearranging the info even if something close to its current version gets merged for the upcoming release. Cheers!

@bansp
Copy link
Member

bansp commented Oct 24, 2022

This is firstly to register my thanks to @sydb for the wonderful lot of work he has put into his review of the PR. I prefer to do this here, rather than within the now lengthy and I'm not sure how persistent PR itself. Extremely helpful, and I think I'm learning even some small things, like whether I really want to use a solidus, or is the use of it just admitting my laziness...
There's also a practical question to this though: how real is the chance to make it for the release? I need to spend some time on a yearly report today, and I'm not sure how long the fixes are going to take -- some are, admittedly, quick, but some points have made me realise that I had managed to tuck an entire sub-mechanism of the TEI aside in my brain while trying to handle the issue quickly, even despite the documentation for prefixDef still being open in one of my tabs. Wonderful stuff, I'll be happy to address it, but -- is there still a chance that, having put another night into this, I will see the PR go into the current release, or will I rather learn then that I should have taken a deep breath already at this point and calmly schedule this work item among the others that are winking at me from my calendar? Are we still in the process of the current release, @peterstadler , please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment