Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing elements with a default value if the element can be multiple #409

Open
hubblec4 opened this issue May 27, 2021 · 8 comments
Open
Labels
clarifications Improve the readability and comprehension of the specs errata-rfc8794 Errors found in RFC 8794
Milestone

Comments

@hubblec4
Copy link

hubblec4 commented May 27, 2021

11.1.19. Note on the use of default attributes to define Mandatory EBML Elements

In this section we find an overview about when an element is required to write.

But the attribute multiple (maxOccurs > 1) is not taken into account.

As example: the ChapLanguage element. It has a default vaule, since a while it has also a mandatory attribute and it is multiple.

If we use the ChapLanguage element multiple times within a ChapterDisplay element, and one of the element has a value of "eng" which is the default value, then this ChapLangaue element must be written.
If this element is not written there is no way to get/restore this info on reading.

Here is a link to MKVToolNix where I describ this issue and you find a sample xml for testing.

@robUx4 robUx4 added clarifications Improve the readability and comprehension of the specs errata-rfc8794 Errors found in RFC 8794 labels May 30, 2021
@robUx4
Copy link
Contributor

robUx4 commented May 30, 2021

There's indeed an issue for the general case. I understood the issue better after reading this commit.

The element is mandatory so it must be either written explicitly or omitted and then the default value is used. If it is ommited We can assume the default value is used. But if there's a second value to use we can't really tell if the default value is also assumed to be present or not.

IMO this logic is not possible, a mandatory element with a default value MUST be unique (maxOccurs=1).

I think it would work for ChapLanguage as well. I cannot think of a use case where 2 languages would describe a chapter string (apart from the other ChapLanguageIETF value which should be used if present). If anything just double the ChapterDisplay item if there are 2 languages. There might be files where there is more than one ChapLanguage set in a ChapterDisplay, although from the mkvmerge commit, it seems it was never the case at least with English.

robUx4 added a commit to ietf-wg-cellar/matroska-specification that referenced this issue May 30, 2021
Following the discussion in ietf-wg-cellar/ebml-specification#409
It could be assumed that English is always present because ChapLanguage is
mandatory and has a default value.

To avoid this interpretation we should have one language (and one IETF variant)
per ChapterDisplay item. We already have to use many ones per language. The merge
of different values because they have the same value is unlikely (but it may
exist in the real world).
@robUx4
Copy link
Contributor

robUx4 commented May 30, 2021

An other alternative is to specify that the default value only applies if the element is unique. In other words if you have to store 2 values and one of them is the default value, the value must be explicitly set (like the mkvmerge fix). But that's a more complex thing to handle in muxers as you need to check each sibling elements before being able to write an element, write the value of the element, or use zero-length. An extra loop was added to mkvmerge to handle this case, the same would need to be done in libebml (and possibly in various places everytime we check if an element needs to be saved or not). A similar change may be needed in libavformat and libwebm as well...

That's why I think restricting ChapLanguage to a single language per item is the easiest path. There's probably not a lot of code that write multiple language for one string, nor read it that way (in VLC the value is logged but not used).

@mbunkus
Copy link
Contributor

mbunkus commented May 30, 2021

But if there's a second value to use we can't really tell if the default value is also assumed to be present or not.

I disagree. I find the specs pretty clear on this point: a reader has to use the default value if and only if the element isn't present in the master at all. If it is present, no matter how many times, the default value simply doesn't come into play.

This in turn leads to a pretty easy-to-follow rule for writers: if you need to write more than one element of the same type in the same master, they must all be written, no matter their value.

Therefore I'd prefer not to change the specs retroactively (even though I'd probably make both ChapterLanguage and ChapterCountry not multiple if I were to re-design the spec today) as you proposed in your second comment:

An other alternative is to specify that the default value only applies if the element is unique. In other words if you have to store 2 values and one of them is the default value, the value must be explicitly set (like the mkvmerge fix).

Yes, it's more complex for a writer, though the algorithm itself isn't that complex at all:

  1. Iterate over all children in a master & count the number of child elements present by element ID.
  2. Iterate over all children in a master again & remove all those that are mandatory, have their current value equal the default value & which only occur once within the master (according to the result of step 1).

This (and its implementation in mkvmerge) is a generic algorithm that can be used for all types of elements, not just the ones in question here.

I just don't like changing specs retroactively, especially as there are programs out there ( @hubblec4 's ChapterEditor) that already use multiple languages & countries per display.

That's why I think restricting ChapLanguage to a single language per item is the easiest path.

If we decide to change the specs retroactively that way, we should also make ChapterCountry not multiple. Even though it isn't mandatory and doesn't pose the same problem, it doesn't make sense semantically to allow multiple countries but only a single language.

And while we're at it, ChapLanguageIETF should then also not be multiple anymore as, again, it wouldn't make sense semantically to allow multiple IETF languages but only one legacy language.

@hubblec4
Copy link
Author

hubblec4 commented May 30, 2021

IMO this logic is not possible, a mandatory element with a default value MUST be unique (maxOccurs=1).

You are wrong with this in my opinion. I have also implemented this feature this night and it was very easy.

I cannot think of a use case where 2 languages would describe a chapter string

Wow, why not?
This happens so often to me!
In the Star Trek series you will find a lot of chapters which have a name of a person, and this is the same in all 5 languages.
It makes for me no sense to insert 5 ChapterDisplay with the same chapter name.

@hubblec4
Copy link
Author

hubblec4 commented May 30, 2021

In german and english you have so often the same word and so I don't need to insert a second ChapterDisplay with the same chapter string.

@hubblec4
Copy link
Author

That's why I think restricting ChapLanguage to a single language per item is the easiest path.

No. This breaks a lot of files(which I have) and I have to re-write my chapterEditor.

@robUx4
Copy link
Contributor

robUx4 commented Jun 6, 2021

But if there's a second value to use we can't really tell if the default value is also assumed to be present or not.

I disagree. I find the specs pretty clear on this point: a reader has to use the default value if and only if the element isn't present in the master at all. If it is present, no matter how many times, the default value simply doesn't come into play.

I don't think it's that crystal clear in RFC 8794: https://www.rfc-editor.org/rfc/rfc8794.html#section-11.1.6.8 and https://www.rfc-editor.org/rfc/rfc8794.html#name-note-on-the-use-of-default-

There's even a table with all cases explaining when you should write it or not. Not a mention if the value is unique or not.

However the definition of default may lead towards that interpretation:

If an Element is mandatory (has a minOccurs value greater than zero) but not written within its Parent Element or stored as an Empty Element, then the EBML Reader of the EBML Document MUST semantically interpret the EBML Element as present with this specified default value for the EBML Element. An unwritten mandatory Element with a declared default value is semantically equivalent to that Element if written with the default value stored as the Element Data.

It defines when a missing element should be interpreted as the default value present. It doesn't say when another value is present. One could assume if it's present that rule doesn't apply and so we're back to normal. One could also argue that the last sentence can be interpreted as the mandatory+default element is always present. In that interpretation the value of maxOccurs would determine if both written and unwritten values could be used or only the written one. It doesn't seem right but it fits with RFC 8794 as it is now.

Most of our elements with a default value have a maxOccurs=1 but some don't: ChapLanguage, TagTrackUID, TagEditionUID, TagChapterUID and TagAttachmentUID. One could argue the tag UIDs should be unique (the tag interpretation is very vague right now, they are at least supposed to be exclusive). But the problem remains for ChapLanguage.

IMO something like what @mbunkus says should be added to the default section of EBML:

if you need to write more than one element of the same type in the same master, they must all be written, 
no matter their value.

@hubblec4
Copy link
Author

hubblec4 commented Jun 6, 2021

There's even a table with all cases explaining when you should write it or not. Not a mention if the value is unique or not.

Yeah, I would say the mention if the value is unique or not is missing and should be added to these table.

Most of our elements with a default value have a maxOccurs=1 but some don't: ChapLanguage, TagTrackUID, TagEditionUID, TagChapterUID and TagAttachmentUID.

Only the ChapLanguage element has all 3 attribute (mandatory, multiple and a default value).
The other Tag_UID elements don't have a mandatory attribute.

@robUx4 robUx4 added this to the new-rfc milestone Jul 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarifications Improve the readability and comprehension of the specs errata-rfc8794 Errors found in RFC 8794
Projects
None yet
Development

No branches or pull requests

3 participants