-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC changed semantics wrt. default values & non-mandatory elements #394
Comments
libebml's code follows the old interpretation, too. Here is the corresponding bug I've filed yesterday. Like I said, this is the RFC changing established interpretation of the specifications as they had been until the RFC. |
The linked old version even contains an even clearer statement: Default
While the RFC indeed changed this EBML part of the specs, there is an easy solution that does not require reopening the EBML specs: Declare the relevant Matroska elements as mandatory to reinstate the old behaviour. |
Sure, we could go that route. I'm also concerned about existing knowledge out there. For as long as EBML+Matroska has existed, developers were used to treat elements with default values one way. Now the RFC comes along and changes the semantics very subtly. This will trip man y developers up. It tripped me up in the discussion about the new track flags & their default value. |
As said on the libebml side, there aren't that many elements affected. The most notable one being
So for most element I think we can make them mandatory. |
As for the old spec there is this notable part
You could read it as "you can leave out an element 'only' if it's mandatory". Which contradicts what the Interestingly WebM took our colorimetry table from our spec but don't explain how to interpret it. |
The RFC made two changes with respect to default values:
It seems that several parsers (libebml and libavformat) already accepted zero-length elements and interpreted them as containing a value of zero. Such a parser already parses zero-length elements with a default value of zero correctly, the second point is no problem. But if an element is not mandatory, it is nevertheless affected by the first change. So there are more affected elements, namely at least |
And interpreting "Mandatory elements with a default value may be left out of the file." as "you can leave out an element 'only' if it's mandatory" is an interpretation that is at odds with what is said about default values. If there is a self-contradictory and a non self-contradictory interpretation, the latter is of course to be preferred. (I remember having found it odd years ago that there were elements with a default value that are not marked as mandatory, although these elements are automatically present, but I thought it was down to an oversight. After all, the rules for zero-length elements didn't exist back then, so if a element with a default value that is not physically present may not be assumed to be present, then it made no sense at all for said element to have a default value.) |
Where the RFC deviates from existing code is that a Length of 0 was assumed to be zero (or empty string), rather than the default value. That's regardless of the default value. In hindsight that's also a good use of the feature. It might make some parsing easier. Assuming a non mandatory element to be (virtually) present just because it has a default value is just wrong. Otherwise the mandatory flag is meaningless as soon as you have a default value. But some elements should not be present unless they are actually used (think encryption for example). The original spec notes were misleading in that regard. And as I said the "correct" interpretation is the mandatory explanation that tells if an element can be left out or not. |
As for the conflict between Mandatory and Default value, the Mandatory flag is more important than the Default flag. You cannot write a file properly if you don't know what elements are mandatory or not. You can however write a file without knowing any default value at all. You just write all values. |
Following the bug/inconsistency found in EBML [1] it might be better to make the track language mandatory. It will give better past/present compatibility. In light of the next Track Selection explanation it's also vital to know the language when you have more than one audio/subtitle track. This may be a problem for video tracks that were not mandatory and technically didn't have a defined language. The flag does make sense though, even for video. For example some text on the screen may appear differently between languages. This change will turn every track of every file that doesn't have this value set assume it's in English. That's probably what any system dealing with track selection is already doing anyway given it was the official default value. [1] ietf-wg-cellar/ebml-specification#394
Also the original specs [1] [2] didn't have any explanation on what the Mandatory and Default flags were. Up until then the de facto standard was what was in libebml. The explanation that came later did not match that since a mandatory element would have a 0/"" default value, at least with a zero length. And apparently it's the same in libavformat. So I wouldn't take this text as the golden standard. WebM which is supposed to be a stricter spec also gives no explanation, relying on us to set it straight. The website was updated in 2009/2010 with more technical details especially for the launch of WebM. I can't remember if I did it or someone else (at CoreCodec?) did it. But it's possible the interpretation was wrong. [1] https://web.archive.org/web/20090417125242/http://www.matroska.org/technical/specs/index.html |
I tried to identify where this discrepancy was introduced and I think it's in #40. At https://github.com/ietf-wg-cellar/ebml-specification/pull/40/files#diff-899095530eee35dcac20d8b6f2fbdf367fd0f05b76e52e54e8fe31910ad7df93R197 I used the phrase |
Interrestingly @mjbshaw pointed at the issue of interpreted as a zero-value if the length is 0: https://github.com/ietf-wg-cellar/ebml-specification/pull/40/files/4d9606b886fbd92f6a7a994d122b375a46a43325#r46760699 There was a discussion on the mailing list https://lists.matroska.org/pipermail/matroska-devel/2015-October/004838.html My interpretation of the Mandatory field was already that |
I fully agree with @robUx4. |
It sounds like once all's said and done, this line will need to be updated on the matroska.org site: https://github.com/Matroska-Org/infrastructure/blob/5c235294cd9c8d7dc847ef1855bae3c2e48de569/website/transforms/ebml_schema2spectable.xsl#L17 |
I've verified using the Wayback machine[1] that right before the website was updated to match the released RFC the semantics of default values was as follows:
Note that there is no distinction made for whether an element is mandatory or not.
Now the RFC turns this into the following:
and
and the whole section 11.1.19, which only talks about mandatory elements.
So to recap:
This is not just theoretical. Pretty much all Matroska readers out there treat mandatory and non-mandatory elements the same wrt. default values. Most popular example is the track language element which is non-mandatory with a default value of
eng
. Software such as MKVToolNix, VLC, ffmpeg, mkclean all use the valueeng
if the element isn't present.The RFC retroactively changes the meaning of nearly all files ever created. No matter what we might think about how default values should be handled, such a radical, retroactive change is unacceptable in my opinion. Unfortunately I didn't realize this during the RFC process, so yeah, mea culpa.
[1] https://web.archive.org/web/20190630173942/https://matroska.org/technical/specs/notes.html
The text was updated successfully, but these errors were encountered: