Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC changed semantics wrt. default values & non-mandatory elements #394

Open
mbunkus opened this issue Feb 2, 2021 · 14 comments
Open

RFC changed semantics wrt. default values & non-mandatory elements #394

mbunkus opened this issue Feb 2, 2021 · 14 comments
Labels

Comments

@mbunkus
Copy link
Contributor

mbunkus commented Feb 2, 2021

I've verified using the Wayback machine[1] that right before the website was updated to match the released RFC the semantics of default values was as follows:

Default Values
The default value of an element is assumed when not present in the data stream. It is assumed only in the scope of its upper-element (for example Language in the scope of the Track element). If the upper element is not present or assumed, then the element cannot be assumed.

Note that there is no distinction made for whether an element is mandatory or not.

Now the RFC turns this into the following:

6.1. Data Size Format

If an Empty Element has a default value declared, then the EBML Reader MUST interpret the value of the Empty Element as the default value.

and

11.1.6.4. minOccurs

EBML Elements with "minOccurs" set to "1" that also have a default value (see Section 11.1.6.8) declared are not REQUIRED to be stored but are REQUIRED to be interpreted; see Section 11.1.19.

and the whole section 11.1.19, which only talks about mandatory elements.

So to recap:

  • Before the RFC all elements with default values would not have be written to the stream.
  • After the RFC non-mandatory elements are suddenly handled differently.

This is not just theoretical. Pretty much all Matroska readers out there treat mandatory and non-mandatory elements the same wrt. default values. Most popular example is the track language element which is non-mandatory with a default value of eng. Software such as MKVToolNix, VLC, ffmpeg, mkclean all use the value eng if the element isn't present.

The RFC retroactively changes the meaning of nearly all files ever created. No matter what we might think about how default values should be handled, such a radical, retroactive change is unacceptable in my opinion. Unfortunately I didn't realize this during the RFC process, so yeah, mea culpa.

[1] https://web.archive.org/web/20190630173942/https://matroska.org/technical/specs/notes.html

@mbunkus mbunkus added the bug label Feb 2, 2021
@mbunkus
Copy link
Contributor Author

mbunkus commented Feb 2, 2021

libebml's code follows the old interpretation, too. Here is the corresponding bug I've filed yesterday. Like I said, this is the RFC changing established interpretation of the specifications as they had been until the RFC.

@mkver
Copy link
Contributor

mkver commented Feb 2, 2021

The linked old version even contains an even clearer statement:

Default

  • The default value of the element.
  • When the element's parent is present and the element is not, the presence of the element is assumed virtually with the default value. When the element's parent is not present, the element's default value is ignored.
  • Numeric value are expressed in decimal.

While the RFC indeed changed this EBML part of the specs, there is an easy solution that does not require reopening the EBML specs: Declare the relevant Matroska elements as mandatory to reinstate the old behaviour.

@mbunkus
Copy link
Contributor Author

mbunkus commented Feb 2, 2021

Sure, we could go that route.

I'm also concerned about existing knowledge out there. For as long as EBML+Matroska has existed, developers were used to treat elements with default values one way. Now the RFC comes along and changes the semantics very subtly. This will trip man y developers up. It tripped me up in the discussion about the new track flags & their default value.

@robUx4
Copy link
Contributor

robUx4 commented Feb 6, 2021

As said on the libebml side, there aren't that many elements affected. The most notable one being TrackEntry\Language. As you mention, pretty much all software assume that if it's not present, it's "eng" (VLC even uses an incorrect default value https://code.videolan.org/videolan/vlc/-/blob/master/modules/demux/mkv/mkv.cpp#L925). It would have little effect to make it mandatory.

  • The colorimetry values MatrixCoefficients,TransferCharacteristics and Primaries have a default value (2) that translates to "Undefined". So to have something usable, the default value is not used and therefore written in files. Making them mandatory will have zero effect. (it's a counter-intuitive default value of "do not write this value")

  • CueBlockNumber is also used in WebM so I suppose it has some uses. I'd be curious if, when value is 1, writers (lavf, mkvmerge, libvpx, libaom) they actually skip the value or not. For this element I'd rather lean towards no default value at all. It's also not used at all for seeking in VLC, dunno about other parsers. If it was added to WebM maybe some browsers use it.

  • CueReference\CueRefNumber is not in WebM and I doubt anyone is using it either to write or to read and make use of it.

  • For Tag\Targets\TargetTypeValue (also in WebM) it would make total sense to make it mandatory. You can't interpret a tag if you don't know what it applies to. If muxers didn't write the (default) value, even if it's not mandatory and should have been (under clarified rules) parsers likely generate this value if it's not there otherwise they can't make use of it. That's what VLC does. So I would also make it mandatory.

So for most element I think we can make them mandatory.

@robUx4
Copy link
Contributor

robUx4 commented Feb 6, 2021

As for the old spec there is this notable part

Mandatory

    This element is mandatory in the file.
    Mandatory elements with a default value may be left out of the file. In the absence of a mandatory element,
    the element's default value is used.
    A mandatory element is not written if its parent is not in the file. 

You could read it as "you can leave out an element 'only' if it's mandatory". Which contradicts what the Default definition said but IMO this was always the intention of mandatory elements (which is a more suggestive name as minOccurs="1").

Interestingly WebM took our colorimetry table from our spec but don't explain how to interpret it.

@mkver
Copy link
Contributor

mkver commented Feb 7, 2021

The RFC made two changes with respect to default values:

  1. It unambiguously stated that non-mandatory elements must be physically present to be semantically present even when they have a default value.
  2. It specified rules regarding the value of zero-length elements. In case of strings/UTF-8 (like the TrackEntry\Language element) the new rules deviate from earlier rules; in case of integers and floats, they specify behaviour in cases that were invalid before ("Unsigned Integer - Big-endian, any size from 1 to 8 octets"), ergo they can't break any valid files.

It seems that several parsers (libebml and libavformat) already accepted zero-length elements and interpreted them as containing a value of zero. Such a parser already parses zero-length elements with a default value of zero correctly, the second point is no problem. But if an element is not mandatory, it is nevertheless affected by the first change. So there are more affected elements, namely at least DisplayUnit and DisplayHeight and DisplayWidth. The latter have default values if DisplayUnit is 0 (i.e. pixels), yet they are not mandatory.

@mkver
Copy link
Contributor

mkver commented Feb 7, 2021

And interpreting "Mandatory elements with a default value may be left out of the file." as "you can leave out an element 'only' if it's mandatory" is an interpretation that is at odds with what is said about default values. If there is a self-contradictory and a non self-contradictory interpretation, the latter is of course to be preferred.

(I remember having found it odd years ago that there were elements with a default value that are not marked as mandatory, although these elements are automatically present, but I thought it was down to an oversight. After all, the rules for zero-length elements didn't exist back then, so if a element with a default value that is not physically present may not be assumed to be present, then it made no sense at all for said element to have a default value.)

@robUx4
Copy link
Contributor

robUx4 commented Feb 7, 2021

Where the RFC deviates from existing code is that a Length of 0 was assumed to be zero (or empty string), rather than the default value. That's regardless of the default value. In hindsight that's also a good use of the feature. It might make some parsing easier.

Assuming a non mandatory element to be (virtually) present just because it has a default value is just wrong. Otherwise the mandatory flag is meaningless as soon as you have a default value. But some elements should not be present unless they are actually used (think encryption for example). The original spec notes were misleading in that regard. And as I said the "correct" interpretation is the mandatory explanation that tells if an element can be left out or not.

@robUx4
Copy link
Contributor

robUx4 commented Feb 7, 2021

As for the conflict between Mandatory and Default value, the Mandatory flag is more important than the Default flag. You cannot write a file properly if you don't know what elements are mandatory or not. You can however write a file without knowing any default value at all. You just write all values.

robUx4 added a commit to ietf-wg-cellar/matroska-specification that referenced this issue Feb 7, 2021
Following the bug/inconsistency found in EBML [1] it might be better to make
the track language mandatory. It will give better past/present compatibility.

In light of the next Track Selection explanation it's also vital to know the
language when you have more than one audio/subtitle track.

This may be a problem for video tracks that were not mandatory and technically
didn't have a defined language. The flag does make sense though, even for video.
For example some text on the screen may appear differently between languages.

This change will turn every track of every file that doesn't have this value set
assume it's in English. That's probably what any system dealing with track selection
is already doing anyway given it was the official default value.

[1] ietf-wg-cellar/ebml-specification#394
@robUx4
Copy link
Contributor

robUx4 commented Feb 7, 2021

Also the original specs [1] [2] didn't have any explanation on what the Mandatory and Default flags were. Up until then the de facto standard was what was in libebml. The explanation that came later did not match that since a mandatory element would have a 0/"" default value, at least with a zero length. And apparently it's the same in libavformat. So I wouldn't take this text as the golden standard.

WebM which is supposed to be a stricter spec also gives no explanation, relying on us to set it straight. The website was updated in 2009/2010 with more technical details especially for the launch of WebM. I can't remember if I did it or someone else (at CoreCodec?) did it. But it's possible the interpretation was wrong.

[1] https://web.archive.org/web/20090417125242/http://www.matroska.org/technical/specs/index.html
[2] https://web.archive.org/web/20090423065829/http://www.matroska.org/technical/specs/notes.html

@dericed
Copy link
Contributor

dericed commented Feb 7, 2021

I tried to identify where this discrepancy was introduced and I think it's in #40. At https://github.com/ietf-wg-cellar/ebml-specification/pull/40/files#diff-899095530eee35dcac20d8b6f2fbdf367fd0f05b76e52e54e8fe31910ad7df93R197 I used the phrase If an Element is mandatory when describing the interpretation of default values. Also this section https://github.com/ietf-wg-cellar/ebml-specification/pull/40/files#diff-899095530eee35dcac20d8b6f2fbdf367fd0f05b76e52e54e8fe31910ad7df93R212 discusses handling of defaults only in the context of mandatory elements. So unfortunately this discrepancy has been in place since 2015.

@robUx4
Copy link
Contributor

robUx4 commented Feb 7, 2021

Interrestingly @mjbshaw pointed at the issue of interpreted as a zero-value if the length is 0: https://github.com/ietf-wg-cellar/ebml-specification/pull/40/files/4d9606b886fbd92f6a7a994d122b375a46a43325#r46760699
It doesn't seem to match the text that was merged though. Ironically that interpretation was the one found in libebml.

There was a discussion on the mailing list https://lists.matroska.org/pipermail/matroska-devel/2015-October/004838.html

My interpretation of the Mandatory field was already that if the element is not mandatory and has its default value, the element can be written with a size of zero (contradicting libebml). If it's not written there is no element and no value to use.

@hubblec4
Copy link

hubblec4 commented Feb 8, 2021

I fully agree with @robUx4.
The mandatory flag works independent from the DefaultValue attribute and wise versa.
Only the mandatory flag signals if an element must be written.
A mandatory element can only be omit when there is a DefaultValue attribute.

@rcombs
Copy link

rcombs commented Feb 21, 2021

It sounds like once all's said and done, this line will need to be updated on the matroska.org site: https://github.com/Matroska-Org/infrastructure/blob/5c235294cd9c8d7dc847ef1855bae3c2e48de569/website/transforms/ebml_schema2spectable.xsl#L17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants