Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead iso cat #2359

Merged
merged 47 commits into from
Oct 25, 2022
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
0624736
initial version, still using the DCR namespace and prefix
bansp Sep 17, 2022
6f16789
Re-generated spec lists.
bansp Sep 17, 2022
ceb9509
remove the dcr prefix from the spec alone (causing a massive amount o…
bansp Sep 17, 2022
cee6a16
remove the dcr prefix from the places where it used to be mentioned i…
bansp Sep 18, 2022
3c0a143
add text, add category and taxonomy to att.datcat
bansp Sep 21, 2022
e140a74
withdraw from category and taxonomy, after all (they have equiv so no…
bansp Sep 21, 2022
2c58823
merge dev
bansp Oct 13, 2022
2d1719c
Re-generated spec lists.
bansp Oct 13, 2022
127fd1c
an extended example for the Council
bansp Oct 14, 2022
ffeb733
initial structuring; I try to refrain from exemplifying 'equiv' here
bansp Oct 17, 2022
011efa6
fixes and extensions
bansp Oct 17, 2022
1458133
validate targetDatcat in the feature-modelling elements; more text
bansp Oct 17, 2022
4c5a910
small fixes
bansp Oct 17, 2022
1cb6cf5
eliminate the code element
bansp Oct 17, 2022
c41b163
eliminate ISOCat, temporary change of the referenced schema (to Pader…
bansp Oct 17, 2022
c32c3dc
minor but illustrative addition
bansp Oct 17, 2022
82d375d
remove references to ISOCat, add more examples in order to complement…
bansp Oct 18, 2022
2d865b5
spotted by chance; eliminated an ISOCat reference in the desc
bansp Oct 18, 2022
ca59bc9
minor fixes in the wording and in example layout
bansp Oct 18, 2022
43fcc60
add fixes as requested by the reviewers
bansp Oct 20, 2022
c71d79f
add fixes as requested by the reviewers
bansp Oct 20, 2022
5854cd7
turn a dead reference into a bland remark (no reference set of data c…
bansp Oct 20, 2022
9bcf4dd
reworded and modified in reaction to PR reviews
bansp Oct 20, 2022
e074d29
modify the validation script name
bansp Oct 20, 2022
a8bab27
fix the versionDate
bansp Oct 20, 2022
dad0474
fix a symbol at "commonNoun"
bansp Oct 20, 2022
1e9f211
editorial fixes, links
bansp Oct 20, 2022
5d38ae2
better link to DatCatInfo
bansp Oct 20, 2022
339a74d
Update P5/Source/Specs/att.datcat.xml
bansp Oct 20, 2022
0da6803
Update P5/Source/Specs/att.datcat.xml
bansp Oct 20, 2022
45b1f63
Update P5/Source/Guidelines/en/FS-FeatureStructures.xml
bansp Oct 20, 2022
1402738
use `<q>` for quoted material and `<soCalled>` for names of concepts,…
bansp Oct 20, 2022
9cf826d
editorial fixes
bansp Oct 20, 2022
d067aba
removed some literal quotes in favour of specialized elements
bansp Oct 20, 2022
f8e95f1
Update P5/Source/Guidelines/en/FS-FeatureStructures.xml
bansp Oct 21, 2022
0920fc0
Update P5/Source/Specs/att.datcat.xml
bansp Oct 21, 2022
3b05c99
amend the exemplum concerning the use of datcat atts within dictionaries
bansp Oct 23, 2022
0f069db
example markup rearranged slightly, to put secondary egXMLs into p el…
bansp Oct 23, 2022
b07b342
fixes for targetDatcat
bansp Oct 23, 2022
9aadcae
eliminate barbarous wording
bansp Oct 23, 2022
2ef288d
minor fixes
bansp Oct 24, 2022
bfbab9b
Update P5/Source/Guidelines/en/DI-PrintDictionaries.xml
bansp Oct 25, 2022
13bff5c
implements a lot of Syd's remarks and probably misses some
bansp Oct 25, 2022
baa79cc
removed rend="italic" from `term` (I must have got it wrong)
bansp Oct 25, 2022
f9daa55
implement Syd's remarks
bansp Oct 25, 2022
3092b35
use `term` instead of `soCalled`
bansp Oct 25, 2022
00d3684
adding a prefixDef snippet by Syd
bansp Oct 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions P5/Source/Guidelines/en/DI-PrintDictionaries.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ See the file COPYING.txt for details.
$Date$
$Id$
-->
<?xml-model href="https://jenkins.tei-c.org/job/TEIP5-dev/lastSuccessfulBuild/artifact/P5/release/xml/tei/odd/p5.nvdl" type="application/xml" schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>
<?xml-model href="https://jenkins-paderborn.tei-c.org/view/LingSIG/job/TEIP5-LingSIG-deadISOCat/lastSuccessfulBuild/artifact/P5/release/xml/tei/odd/p5.nvdl" type="application/xml" schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>
bansp marked this conversation as resolved.
Show resolved Hide resolved
<div xmlns="http://www.tei-c.org/ns/1.0" type="div1" xml:id="DI" n="12">
<head>Dictionaries</head>

Expand Down Expand Up @@ -2658,25 +2658,28 @@ equivalently "poj." in Polish, or "Ez." in German, etc., what is actually referr
grammatical value that can be rendered with a plethora of markers, depending on the publisher, language, or lexicographic tradition.
In order to signal that this variety of surface markers in fact indicate the same
underlying value, it is possible to align them with an external inventory of standardized
values. The TEI provides means to align grammatical categories as well as their content
with the ISOcat reference, which is a Web implementation of <ref target="#ISO-12620">ISO 12620</ref>.</p>
values. The TEI provides the <ident type="class">att.datcat</ident> attribute class for the purpose of aligning grammatical (or indeed any sort of) categories as well as their values
peterstadler marked this conversation as resolved.
Show resolved Hide resolved
with a reference taxonomy of shared data categories.</p>
<p>In the example below, a fragment of the entry for <foreign>isotope</foreign> cited
in section <ptr target="#DITPGR"/> is adorned by references to ISOcat definitions for "part
of speech" (<att>dcr:datcat</att>) and "adjective" (<att>dcr:valueDatcat</att>). Depending
in section <ptr target="#DITPGR"/> is adorned by references to external definitions for "part
bansp marked this conversation as resolved.
Show resolved Hide resolved
of speech" (<att>datcat</att>) and "adjective" (<att>valueDatcat</att>). Depending
on the status and extent of the dictionary, various strategies may be used to reduce the
redundancy of the repeated ISOcat references.<egXML xmlns="http://www.tei-c.org/ns/Examples" valid="feasible">
<entry xmlns:dcr="http://www.isocat.org/ns/dcr">
redundancy of the potentially redundant references.<egXML xmlns="http://www.tei-c.org/ns/Examples" valid="feasible">
bansp marked this conversation as resolved.
Show resolved Hide resolved
<entry>
<!--...-->
<form>
<orth>isotope</orth>
</form>
<gramGrp>
<pos dcr:datcat="http://www.isocat.org/datcat/DC-1345" dcr:valueDatcat="http://www.isocat.org/datcat/DC-1230">adj</pos>
<pos
peterstadler marked this conversation as resolved.
Show resolved Hide resolved
datcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"
valueDatcat="http://hdl.handle.net/11459/CCR_C-1230_23653c21-fca1-edf8-fd7c-3df2d6499157"
>adj</pos>
</gramGrp>
<!--...-->
</entry>
</egXML></p>

</egXML>
In the above example, alignment is performed against the <ref target="https://www.clarin.eu/content/clarin-concept-registry">CLARIN Concept Registry</ref>.</p>
</div>
<div type="div3" xml:id="DIMVBO">
<head>Retaining Both Views</head>
Expand Down
84 changes: 65 additions & 19 deletions P5/Source/Guidelines/en/FS-FeatureStructures.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ See the file COPYING.txt for details.
$Date$
$Id$
-->
<?xml-model href="https://jenkins.tei-c.org/job/TEIP5-dev/lastSuccessfulBuild/artifact/P5/release/xml/tei/odd/p5.nvdl" type="application/xml" schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>
<?xml-model href="https://jenkins-paderborn.tei-c.org/view/LingSIG/job/TEIP5-LingSIG-deadISOCat/lastSuccessfulBuild/artifact/P5/release/xml/tei/odd/p5.nvdl" type="application/xml" schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>
bansp marked this conversation as resolved.
Show resolved Hide resolved
<div xmlns="http://www.tei-c.org/ns/1.0" type="div1" xml:id="FS" n="16"><head>Feature Structures</head>
<!-- this para is repeated at the start of ISO std -->
<p>A <term>feature structure</term> is a general purpose data
Expand Down Expand Up @@ -269,17 +269,29 @@ combinations of feature names and values are used is to provide a
<p>Whether at the level of feature-system declarations, feature- and
feature-value libraries, or individual features, it is possible to
align both feature names and their values with standardized external
data category repositories such as ISOcat. <note place="bottom">See
data category repositories. <note place="bottom">See
section <ptr target="#DIMVLV"/> for more discussion of the need and
rationale for ISOcat references.</note> In the following example, both
rationale for aligning the content of grammatical (and other) descriptions with
standardized external taxonomies.</note> In the following example, both
the feature <val>part_of_speech</val> and its value
<val>#commonNoun</val> are aligned with the respective definitions
provided by <ref target="#ISO-12620">ISO DCR (Data Category
Registry)</ref>, as implemented by ISOcat.
<egXML xmlns="http://www.tei-c.org/ns/Examples" valid="feasible" source="#UND">
<fs xmlns:dcr="http://www.isocat.org/ns/dcr">
<!--...-->
<f name="part_of_speech" dcr:datcat="http://www.isocat.org/datcat/DC-1345" fVal="#commonNoun" dcr:valueDatcat="http://www.isocat.org/datcat/DC-1256"/>
<val>NN</val> (standing for "common noun") are aligned with the respective definitions
bansp marked this conversation as resolved.
Show resolved Hide resolved
provided by the <ref target="https://www.clarin.eu/content/clarin-concept-registry">CLARIN Concept Registry (CCR)</ref>.
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<fs>
<f name="part_of_speech" datcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3">
<symbol valueDatcat="http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545" value="NN"/>
</f>
<!-- ... -->
</fs>
</egXML></p>
<p>Since the above representation takes up a lot of space and quickly becomes redundant and
error-prone, it is possible to delegate the task of aligning with external repositories to
elements such as <gi>fLib</gi>, <gi>fvLib</gi>, <gi>fDecl</gi>, or <gi>fsDecl</gi> to reduce the feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible delegation to <fLib>, <fvLib>, <fdecl>, or <fsDecl> is listed here, but only to <fvLib> or <taxonomy>, below (~10 lines down).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that makes sense: first, there is a general statement on what can get aligned to reduce redundancy of individual feature representations. Then, a specific statement about the specific example mentions fvLib that is illustrated, and taxonomy that is not illustrated (but I felt it shouldn't not be mentioned).

representation at hand and to increase its readability at the same time, as shown below.
bansp marked this conversation as resolved.
Show resolved Hide resolved
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<fs>
<!-- ... -->
<f name="POS" fVal="#commonNoun"/>
bansp marked this conversation as resolved.
Show resolved Hide resolved
<!-- ... -->
</fs>
</egXML></p>
Expand Down Expand Up @@ -352,8 +364,22 @@ each one represented as an identifiable <gi>fs</gi> element within a
Other <gi>f</gi> elements may invoke them by reference, using the
<att>fVal</att> attribute; for example, one might use them in a
feature value pair such as: <egXML xmlns="http://www.tei-c.org/ns/Examples" source="#UND"><f name="dental-fricative" fVal="#T.DF"/> </egXML> rather than expanding the hierarchy of the
component phonological features explicitly. </p>
<p>Feature structures stored in this way may also be associated with
component phonological features explicitly.</p>
<p>The feature structure that concludes section <ptr target="#FSSY"/> above, identifying the
value of some part of speech to be a "common noun" can be used in tandem with a
Copy link
Member

@HelenaSabel HelenaSabel Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d recommend using an element instead of the straight quotes. Maybe <soCalled>common noun</soCalled> (or <q>). Same comment goes for line 277 above, and 1625 below.

Copy link
Member

@sydb sydb Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree completely, and the same goes for lines 2664 & 2665 in DI chapter (previous file), too. I do not think <soCalled> is appropriate, though.n (But will be happy to change my mind if someone argues convincingly that the part of speech being discussed is not a common noun, rather said encoding is in error; or some such.) My instinct is either <q> or <term>?
In any case, though, the comma afterwards is missing (before “can” which perhaps should be “may”), and the sentence itself feels a bit run-onny:

The feature structure that concludes section [FSSY] above, identifying the value of some part of speech to be a “common noun” can be used in tandem with a feature-value library, which offers a way to encode a grammatical “tagset”, in this case containing labels for parts of speech: [EXAMPLE]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh, true, I wonder how many straight quotes I have left there... that may take a while ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to use <q> for quoted bits, and <soCalled> for, e.g., names of concepts. Wow, semantic markup... cool. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's just struck me that maybe (just maybe) some of those could be <term>s, but I think that sorting that out needs a bit of quiet and focus, and some target framework, such as that sketched in #2358 -- where you'd, for example, get a pop-up over a <term>, citing its definition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have eliminated simple quotes in the spec and in the FS chapter, with <q> where the quotes were straightforward. I even ended up removing quotes where they were not really needed, it seems.
There are two sentences in FS that feel "run-onny" (if I interpret that correctly) -- in both cases they talk about some functionality that should/may accompany a suggestion exemplified in an earlier section. The potentially unstable word in both is "concludes", because one could imagine the target section to grow, one day. But today, they are true, and I think that when the FS chapter receives a solid revision, these two instances will be modified if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and I've touched the DI chapter as well.

feature-value library, which offers a way to encode a grammatical <term>tagset</term>, in this case
containing labels for parts of speech:
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<fvLib n="POS values">
<symbol xml:id="commonNoun" value="NN" datcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"/>
<symbol xml:id="properNoun" value="NP" datcat="http://hdl.handle.net/11459/CCR_C-1371_fbebd9ec-a7f4-9a36-d6e9-88ee16b944ae"/>
<!-- ... -->
</fvLib>
</egXML>
Such a feature-value library combines the standard short symbolic label for a part of speech (e.g., "NN") with a
bansp marked this conversation as resolved.
Show resolved Hide resolved
mnemonic identifier that can be referenced by means of <att>fVal</att>, and with a persistent identifier, maintained
in a public reference taxonomy repository together with the basic definition of the given concept.</p>
<p>Feature structures stored in the way presented in this section may also be associated with
the text which they are intended to annotate, either by a link from the text
(for example, using the TEI global <att>ana</att> attribute), or
by means of stand-off annotation techniques (for example, using the TEI
Expand Down Expand Up @@ -1176,15 +1202,14 @@ using an XML identifier are discussed in <ptr target="#SAUR"/></note>
way of accomplishing this is to add an XML identifier to each
<gi>fsDecl</gi> element in <ident type="file">example.xml</ident>:
<egXML xmlns="http://www.tei-c.org/ns/Examples" valid="feasible" source="#UND">
<!-- ... -->
<fsdDecl>
<fsDecl type="gpsg" xml:id="GPSG">
<fsdDecl>
peterstadler marked this conversation as resolved.
Show resolved Hide resolved
<fsDecl type="gpsg" xml:id="GPSG">
<!-- information about this type -->
</fsDecl>
<fsDecl type="lex" xml:id="LEX">
</fsDecl>
<fsDecl type="lex" xml:id="LEX">
<!-- information about this type -->
</fsDecl>
</fsdDecl>
</fsDecl>
</fsdDecl>
</egXML>
(Although in this case the XML identifier is simply an uppercase
version of the type name, there is no necessary connection between the
Expand Down Expand Up @@ -1593,6 +1618,27 @@ in a construction. Since PFORM is specified above as an open set,
This example makes use of a negated value: <code>&lt;vNot&gt;&lt;string/&gt;&lt;/vNot&gt;</code>
subsumes any string that is not the empty
string.</p>
<p>For the reduced feature structure that concludes section <ptr target="#FSSY"/> above and
identifies the value of some part of speech to be a "common noun", it is possible to align
the concept of part of speech with its definition and persistent identifier thanks to the
<ident type="class">att.datcat</ident> attribute class, which supplies the
<att>targetDatcat</att> attribute that connects the modeled XML object
with the appropriate locus in a reference taxonomy, as shown below:
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<fDecl name="POS" targetDatcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3">
<fDescr>part of speech (morphosyntactic category)</fDescr>
<vRange>
<vAlt>
<symbol value="NN" datcat="http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545"/>
<symbol value="NP" datcat="http://hdl.handle.net/11459/CCR_C-1371_fbebd9ec-a7f4-9a36-d6e9-88ee16b944ae"/>
<!-- ... -->
</vAlt>
</vRange>
</fDecl>
</egXML>
The above example declares the feature "POS" as instantiating the corresponding concept defined in a reference
taxonomy / ontology, and defines the range of values of the feature at hand by listing the appropriate
peterstadler marked this conversation as resolved.
Show resolved Hide resolved
alternatives, together with their external persistent identifiers.</p>
<p>Note that
the class <ident type="class">model.featureVal</ident> includes all possible
single feature values, including feature structures, alternations
Expand Down
Loading