Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression: duplicate idents in *Specs #681

Merged
merged 2 commits into from
Jun 24, 2024
Merged

Conversation

joeytakeda
Copy link
Contributor

Attempting to resolve the problem of duplicate idents in specs (as mentioned in #678 and #645 and potentially others). At the time of my filing this PR, Test1 is failing (in particular, test21), so more investigation (and more testing) is required before this can be merged.

Refining how the zero-th pass for ODD processing handles gathering up duplicate named specs
@joeytakeda joeytakeda linked an issue May 6, 2024 that may be closed by this pull request
@joeytakeda
Copy link
Contributor Author

Fixed a cardinality constraint, so all tests are now passing. Going to mark this as ready for review

@joeytakeda joeytakeda marked this pull request as ready for review May 6, 2024 02:58
@lb42
Copy link
Member

lb42 commented May 6, 2024

Will this also fix #680? Happy to test it if so!

@joeytakeda
Copy link
Contributor Author

Will this also fix #680? Happy to test it if so!

@lb42 : please do! I haven’t checked to see if this helps with the issues when chaining (though I suspect it might)

@joeytakeda joeytakeda mentioned this pull request May 7, 2024
@joeytakeda joeytakeda linked an issue May 7, 2024 that may be closed by this pull request
Copy link
Member

@sydb sydb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested a simple chain (customization A reads output of odd2odds.xsl on customization B), and it failed miserably using dev branch Stylesheets and succeeded using this branch. (And the output RNC is valid and looks reasonable, and the output HTML looks no worse than expected.)
I have also run all tests in Test/ and Test2/ (except the one that requires XeLaTeX), and they pass.

I am mildly trepidatious because I do not actually understand the change made, but still giving a thumbs up because it looks reasonable, and (as stated above) every output I look at is either the same or better.

Copy link
Member

@ebeshero ebeshero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joeytakeda The code changes look sensible to me. Kudos for typing the variable to help clarify this code, and for moving to value (one to one) comparison.

@ebeshero
Copy link
Member

ebeshero commented Jun 19, 2024

This is passing Test 1 for me, and mostly Test2 (except I think something is borked about the " [exec] BUILD FAILED
[exec] /stylesheet/common/teianttasks.xml:205: Problem reading /stylesheet/Test2/outputFiles/temp-for-ant.zip") @joeytakeda is that the same (known) problem you see?

@lb42 Have you had a chance to test this on a chained ODD? I'd like to see how this does on one of your chained scenarios and see how much this simple yet profound fix addresses those other tickets. We're preparing to refridge (or is that heat) this Friday!

@lb42
Copy link
Member

lb42 commented Jun 21, 2024

I attach a zip containing a compiled ODD and another ODD which uses it. This produces a schema OK, but has trouble running odd-xhtml conversion : both in oxygen and using teitohtml :-(
problem.zip
Message is System ID: /home/lou/.com.oxygenxml/extensions/v25.0/frameworks/tei/tei/xml/tei/stylesheet/common/functions.xsl
Scenario: TEI ODD XHTML
XML file: /home/lou/Public/Schemas/ODD/eltec-0.xml
XSL file: /home/lou/.com.oxygenxml/extensions/v25.0/frameworks/tei/tei/xml/tei/stylesheet/odds/odd2odd.xsl
Document type: TEI ODD
Engine name: Saxon-HE 11.4
Severity: fatal
Problem ID: XPTY0004
Description: A sequence of more than one item is not allowed as the first operand of 'gt' A sequence of more than one item is not allowed as the first operand of 'gt' (@versionDate="2005-12-24", @versionDate="2005-12-24", @versionDate="2005-12-24")

@lb42
Copy link
Member

lb42 commented Jun 21, 2024

I spoke too soon -- the RNG generated is also wrong.

Advice on how to test/ confirmation that "it works for me" both gratefully received!

@lb42
Copy link
Member

lb42 commented Jun 21, 2024

The script I run says
teitoodd --localsource=/home/lou/Public/TEI/P5/p5subset.xml eltec.xml eltec-library.xml
teitorelaxng --localsource=/home/lou/Public/TEI/P5/p5subset.xml eltec-0.xml ../eltec-0.rng
teitohtml --odd --localsource=/home/lou/Public/TEI/P5/p5subset.xml eltec-0.xml ../Doc/eltec-0.html
I see I didnt include the mother odd in the zip, so here it is again
problem.zip

@sydb
Copy link
Member

sydb commented Jun 21, 2024

@lb42
I get An include with href 'eltec-body.xml' failed, and no fallback element was found from any attempt to process eltec-0.xml. I think we can just comment out that one <xi:include> line, because nothing in the <body> should change this problem, right? (I.e., I do not see any <specGrpRef>s that would be unresolvable without the <body> element.)

@lb42
Copy link
Member

lb42 commented Jun 21, 2024

Ah yes, just ignore that include. It's prose waffle not relevant to the odd

@sydb
Copy link
Member

sydb commented Jun 21, 2024

So I tried the the same 3 cmds @lb42 describes, first using the Stylesheets dev branch, and then using this branch. (Note though, that because I did not specify --localsource, I am building against the version of p5subset in each Stylesheets repo; I checked, though, they are the same as each other, although probably not the same as the local one Lou used.)

  • teitoodd eltec.xml eltec-library.xml
  • teitorng eltec-0.xml eltec-0.rng
  • trang eltec-0.rng eltec-0.rnc
  • teitohtml --odd eltec-0.xml eltec-0.html

I then compared each output of these cmds when run using the dev branch of Stylesheets to the corresponding output when run using this branch.

The HTML files were the same (except for timestamps and comments). The eltec-library.xml files seem vastly different from one another. At least in 1 or two cases, though, the difference is that the dev-generated version has duplicate code that the version generated from this branch does not. However, there are lots of other differences, too.

The RELAX NG files, on the other hand, were only somewhat different. But at least for most of the differences which I looked at in detail, it seems to me the version from this branch is the better one.

  1. att.canonical.attributes is incorrectly defined in the dev-generated version: it is defined as 2 copies of the pattern att.canonical.attribute.ref, which itself is never defined. It is “correctly” defined in the version generated against this branch, in that there is only 1 occurrence of the att.canonical.attribute.ref and it is defined, I think properly (i.e., has example from eltec.xml).
  2. <measure> is (incorrectly) a member of model.measureLike in the dev-generated version; model.measureLike is (correctly) empty in the version generated against this branch. (This is because <measure> is removed from model.measureLike in eltec.xml.)
  3. In the defintion of the <TEI> element:
    1. The gloss “TEI document” occurs three times in a row in the dev-generated version, and (correctly) occurs only once in the version generated against this branch.
    2. The description “contains a single … or <teiCorpus> element.” occurs three times in a row in the dev-generated version, and (correctly) occurs only once in the version generated against this branch.
    3. HOWEVER, the element names in that description have lost their pointy brackets in the version generated against this branch.
    4. The reference “4. Default Text Structure 15.1. Varieties of Composite Text” also occurs 3 times in the dev-generated version.
    5. But what is really bad is that the content model is repeated 3 times in the dev-generated version
  4. I did not look carefully, but it seems that this tripling problem occurs (in the dev-generated version) for att.global and <measure>, too.
  5. The RELAX NG generated against this branch is invalid due to duplicate copies of the definition of @calendar in various elements.
  6. The RELAX NG generated against dev is invalid for the same reason, but also because the definition of <measure> is FUBAR.

I am not entirely sure what all this means other than a) ODD chaining is quite fragile, and b) while I still think @joeytakeda’s change, here, is probably an improvement, it seems to me more testing is probably in order.

@lb42
Copy link
Member

lb42 commented Jun 21, 2024

Thanks for confirming that the recent changes made to oddtoodd have fubarred my nice eltec schema!
Clearly Joey's changes improve the status quo, even if it's not clear whether they fix everything. Rolling back and insufficiently tested versions of odd2odd =might also be an option.

@ebeshero ebeshero added the priority: releaseBlocker All tickets with this priority must be cleared in advance of the freeze before a release. label Jun 21, 2024
@ebeshero
Copy link
Member

ebeshero commented Jun 21, 2024

@lb42 @sydb In our Council meeting today, we discussed the complexities of testing to try to resolve the chained ODD processing problems, and thanks to @joeytakeda for stepping us through what he tried and precisely what the problems are. We're marking this as a release blocker. We're talking about testing previous Stylesheets versions in addition to this branch to compare what's going on and see when / how we introduced the problem. More soon, we hope!

@lb42
Copy link
Member

lb42 commented Jun 21, 2024

It seems pretty clear that whatever is going wrong with my ELTeC schema has nothing to do with ODD chaining as such. There are other examples of ODD chaining which don't exhibit this behaviour, and there is a stand alone version of the ELTeC schema (attached as a text file) which does

eltec-standalone.txt

@ebeshero ebeshero added this to the Release 7.57.0 milestone Jun 21, 2024
@ebeshero
Copy link
Member

ebeshero commented Jun 24, 2024

@lb42 It turns out you're correct, that what was going wrong with ELTeC schema wasn't to do with ODD chaining at all. We investigated the problem in our Stylesheets meeting today and discovered an interesting problem: What generates the duplicate idents is specific this problem:

  • In last year's release, we moved @calendar out of its original class (from att.calendar to att.calendarSystem).
  • Your ODD attempts to delete @calendar from its original class (not from its new class).
  • Rather than doing nothing (which is what should happen), the output Relax NG actually generates duplicate @calendar attribute definitions.

We tested this with other attributes, too, and we consistently generate the same error that we're now working on in this new ticket: #687

Basically you can address this by updating your reference to @calendar in your ODDs, but we're hoping to stop this kind of error from happening at all...

@lb42
Copy link
Member

lb42 commented Jun 24, 2024

Thanks for the comment @ebeshero, but I don't think that's my problem. The eltec-standalone.txt file which I posted on this ticket does not reference @Calendar at all (the line you quote is commented out), but this file still generates errors when processed to HTML. Please try downloading my test file, renaming it to .xml, and open it with an Oxygen using the most recent framework release (4.8.44) and you will see the problem.(Interestingly, if I use the previous version of the oxygen framework, the problem goes away, which suggests strongly that this is something you guys introduced...

@ebeshero
Copy link
Member

ebeshero commented Jun 24, 2024

@lb42 No indeed, we found the problem in eltec.xml, where from line 122 we see the following:

     <classSpec type="atts" ident="att.datable" mode="change">
      <attList>
       <attDef ident="calendar" mode="delete"/>
       <attDef ident="period" mode="delete"/>
      </attList>
     </classSpec>

We found the Relax NG generated from this ODD as well as the HTML to be bearing multiple definitions of the very @calendar attribute it was calling to delete, and @period to be handled just fine.

You're right that this is based on something we changed. That change was simply moving @calendar out of att.datable and into a new class, att.calendarSystem. The processing of that should not have been generating duplicate stuff, but it does, and we think we've figured out where and how, and have a (possible/quite likely) repair for it. See #687.

@joeytakeda
Copy link
Contributor Author

Hi @lb42 — just for completeness' sake, I ran teitorng and teitohtml on the eltec-standalone file you sent using the dev branch (which now includes the above change), which ran without errors.

I believe your issue here (as it was in #645 and #678) was duplicate elementSpecs with the same ident (in this case, it was the elementSpec for the TEI element, defined around lines 147, 319, and 888). This was a regression in the Stylesheets (caused by change to ensure uniqueness in element names across namespaces), but since none of our test files had duplicate elementSpecs, this was never caught.

At the very least, this will make it into the new release (which should occur within the next few weeks). As far as I understand it, though, the oXygen framework will only update with a new oXygen release (though @martindholmes will know more about that)

@lb42
Copy link
Member

lb42 commented Jun 25, 2024

Sorry I was insufficiently clear. "the problem" i am referring to is nothing to do with the @Calendar related bug (thanks for fixing which). The file "eltec-standalone" doesn't use eltec.xml -- as the name suggests it is a standalone unchained version. But it has something wrong with it because it fails to generate HTML. The message generated is
in XSL file: /home/lou/.com.oxygenxml/extensions/v25.0/frameworks/tei/tei/xml/tei/stylesheet/odds/odd2odd.xsl
A sequence of more than one item is not allowed as the first operand of 'gt' (@versionDate="2005-12-24", @versionDate="2005-12-24", @versionDate="2005-12-24")
Maybe I should raise this as a separate issue.

@lb42
Copy link
Member

lb42 commented Jun 25, 2024

Ah, I have now seen Joey's comment. Thanks! There is indeed a duplicate declaration for TEI in the standalone file (there are three in fact) and removing it removes the problem. BUT this is very worrying: the ODD system is intended to permit multiple declarations for the same object. A processor has to know how to reconcile/unify them sensibly...

@martindholmes
Copy link
Contributor

martindholmes commented Jun 25, 2024

@lb42 I don't believe there is a canonically sensible way to handle duplicate definitions, and the ODD spec doesn't provide such an algorithm, so I think ATOP will treat them as errors and argue that they should be invalid.

@lb42
Copy link
Member

lb42 commented Jun 25, 2024

@martindholmes It rather depends what you mean by duplicate definition. If i supply an elementspec containing just an exemplum for example is that a duplicate?

@martindholmes
Copy link
Contributor

Yes. You can devise scenarios in which a human could effortlessly combine the instances into a coherent whole, but it's difficult to write an algorithm to determine when that pertains and when it doesn't. And if you can easily combine them, then why not do it in your ODD?

@lb42
Copy link
Member

lb42 commented Jun 25, 2024

@martindholmes Seems to me that if i say <elementSpec ident="foo" mode="change"> it is plausible to raise an error, or at least a warning, in the case where there ISN'T a duplicate element with ident foo. So it rather depends what you are classing as "duplicate". If there are two or more <elementSpec ident="foo" mode="change"> I can see why a lazy ODD processor might not wish to go through the tedious business of deciding whether the various changes are mutually compatible, but my understanding is that the ODD system requires such capabilities all over the place. Of course atoc is free to choose whether or not to provide them, but they are not inherently nonsensical. You might well when authoring your ODD regard the production of examples as a separate exercise from defining the content models etc. and so provide them in a separate specGroup. You might even want to have different sets of examples for different applications of the same ODD subset. So you might have lots of "duplicate" specs containing just an exemplum.

@martindholmes
Copy link
Contributor

@lb42 As far as we know (based on all the ODDs we collected as part of our initial survey when starting ATOP), you're the only person who uses duplicate elementSpecs -- there was one other case but it was a genuine error, with the same elementSpec appearing twice. I think if you have a very clear idea of exactly how an ODD processor should resolve all the potential permutations arising from a multiplicity of elementSpecs with the same @ident and namespace, in the various orders in which they could appear, then I think it would be up to you to lay out the algorithm, and up to the Council to decide whether to adopt it as part of the ODD specification. But since most ODD writers never do this, and there is no formal algorithm in the specification, I think it makes more sense for ATOP to treat it as an error. You may think that's lazy, but we also have to consider the downstream effects of writing lots of code to resolve unpredictable combinations of specs without any guidance from the spec, and then condemn future maintainers to wrestle with it. @sydb and @HelenaSabel, any thoughts on this?

@ebeshero ebeshero deleted the iss678_duplicateIdents branch June 25, 2024 16:15
@lb42
Copy link
Member

lb42 commented Jun 25, 2024

I am happy to withdraw the provocative term "lazy"! I don't think you're really addressing my point though -- which is that all ODD writers, including ones not yet born, need clear guidance on how far they need to go in combining into a single spec all its constituent parts.

@ebeshero
Copy link
Member

ebeshero commented Jun 25, 2024

I have to second @martindholmes here after we've repeeatedly struggled with repairs and tests for odd2odd.xsl. Trying to figure out which conditions are being handled and when is a pretty complicated process in the current odd2odd, and trying to account for them in ATOP won't be easy.

Anyway, @lb42 's ELTeC files did prove immensely helpful for finding that serious problem with processing re-located attribute classes! But having to reconcile multiple elementSpecs sharing the same @ident in a single standalone ODD does seem an extraordinary expectation. I could see these arising in separate files in a chaining process, but shouldn't the standalone at least reconcile them into a single elementDef? And maybe this discussion of guidance for handling multiple constituent parts of odds, chained, and unchained, should be a new ticket.

@martindholmes
Copy link
Contributor

@ebeshero The way I conceive of chaining is that it's a sequence of discrete steps, so no matter how long the chain, the processor should only ever have to resolve one unique spec in the customization against one unique spec in the preceding source ODD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: releaseBlocker All tickets with this priority must be cleared in advance of the freeze before a release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

odd2odd still broken Duplicate attributes in odd → rng conversion with Release 7.56.0
6 participants