-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AV1 OBU Support #28
Comments
Quick answer is yes. Long answer is what is the value. I use h265nal and h264nal in a regular basis, mostly to understand the contents of a Annex B (h265 or h264) streams. I also script around the tools (e.g. see https://github.com/chemag/itools, which uses h265nal to provide information on the SPS colorimetry ot HEIC images). Now, there are other parsers that do a similar job. First, ffmpeg BSF filter does a very similar job. My main issues with ffmpeg are two:
For h265, there is also https://github.com/strukturag/libde265. It is a deeper parser than either h265nal or ffmpeg's BSF. For example, the itools project mentioned above uses a fork of libde265 (https://github.com/chemag/libde265) to analyze the QP values used in a per-CTU, per-frame basis. Main issue IMO is that the author is not interested (see strukturag/libde265#201). Now, what I would like is a parser that goes in both directions. Basically a way to convert a binary format (preferably defined using a structured format for binary definition, like kaitai) into a structured text format, and back. Use cases:
In order to implement this, the idea is to use an intermediate structured format that allows (a) binary to structured-format conversion, and (b) structured-format to text conversion. For the structured format, I like protobuf, which gives you (b) for free by using protobuf-text (I also like the protobuf text format).
This is the Section 4 Figure in https://github.com/chemag/m2pb, where the idea was applied to mpeg2-ts streams. Now, the Parse/Dump functionality in |
I like these thoughts, and while my interests are primarily related to implementing vulkan video extensions which mandate apps handle muxing/demuxin and parsing, I think those use cases would benefit as well from your proposed solution. Is this something that has an established effort anywhere? If not, have you have considered owning such a project? Regardless, I also like protobufs for this approach and would be interested in joining efforts. Have you looked at Hammer? I tried looking at it briefly, and it seems to implement a software defines intermediate structure. But from the limited documentation I was unable to determine if it supported defining grammars capable of handling entropy encoded values such as the ones present in h264/h265. I have experience with libav and ffmpeg as well but in addition to some of the reasons you mentioned, I take issue with it due to the code base being ancient, and as a result it seems to suffer from many issues including the somewhat ideological nature of its development as well as the horrid documentation. I think a tool built on modern technologies such as protobuf & modern c++, with an emphasis on code quality, and a succinct api would perform greatly in this market. |
Rereading your last paragraph again, it occurs that maybe some of my comments were not super clear. Basically it seems like some existing projects make a similar approach to this problem, but a generic solution may solve this issue for all of these use cases and beyond. One where a developer may bring their own grammar (regardless of the form factor of such a grammar), and within a succinct a lightweight framework provided by this solution they are able to provide any additional complexities needed by their parsing algorithm. Something like protobufs alone does not allow for the control needed to parse dynamic/entropy encoded values to my knowledge, please correct me if you know differently. |
I'm not sure exactly what you are describing here ("video extensions which mandate apps handle muxing/demuxin and parsing"), and how h265nal (or an AV1 parser) would work. These parsers are just a binary-to-text converter. Right now we're just printing the text values we convert to. My idea is to get the reverse conversion, so as to allow editing the binary streams (Annex B).
I took a look at it. It looks like a layer that facilitates the traditional C parsing approach by defining a series of functions so that you do not have to read byte by byte, and then do hton/ntoh[ls] conversions. What I really want is to be able to feed an video bitstream syntax. The MPEG formats use the acronym "RBSP", and AV1's is very similar. For example, for AV1, I'd like to start with all the syntax definitions, e.g.
This should autogenerate (a) a parser that accepts a raw (Annex B) AV1 stream and produces a set of protobuf objects representing OBUs and the descendent objects, and (b) a dumper that does the opposite operation. That means that the only work for creating an AV1 parser would be to collect the whole list of syntax definitions from the standard, and then write the skeleton of a full parser/dumper. The closest thing I've seen for this is the kaitai syntax. Syntax is not very nice IMO, but e.g. it allows defining an ethernet header like this:
I think a better syntax would allow a more generic if/then mechanism to drive the parser or set default values. In fact, I think a good solution will start with a better syntax for a language like this.
The parsing process needs to produce something that can be operated upon. My idea of "operation" includes getting text-based versions of that something (so we get the binary-to-text conversion feature), editing that something (changing values, removing items, etc.), and writing back to binary.
The usual processing in my case does not have much performance requirements: I typically use small Annex B streams, so I don't mind paying the overhead that protobufs force. |
Sorry to open an issue for this, but I was wondering if you have considered using work in h264nal and h265nal as the basis for a similar project to parse AV1 obu structures? If so, I may have a need and would be interested in helping with such a project.
Please close this at your discretion.
The text was updated successfully, but these errors were encountered: