Re: [PATCH] media: docs-rst: Document m2m stateless video decoder interface
From: Tomasz Figa
Date: Thu Jan 24 2019 - 04:02:26 EST
On Thu, Jan 24, 2019 at 5:59 PM Paul Kocialkowski
<paul.kocialkowski@xxxxxxxxxxx> wrote:
>
> Hi,
>
> On Thu, 2019-01-24 at 17:07 +0900, Tomasz Figa wrote:
> > On Wed, Jan 23, 2019 at 7:42 PM Paul Kocialkowski
> > <paul.kocialkowski@xxxxxxxxxxx> wrote:
> > > Hi Alex,
> > >
> > > On Wed, 2019-01-23 at 18:43 +0900, Alexandre Courbot wrote:
> > > > On Tue, Jan 22, 2019 at 7:10 PM Paul Kocialkowski
> > > > <paul.kocialkowski@xxxxxxxxxxx> wrote:
> > > > > Hi,
> > > > >
> > > > > On Tue, 2019-01-22 at 17:19 +0900, Tomasz Figa wrote:
> > > > > > Hi Paul,
> > > > > >
> > > > > > On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski
> > > > > > <paul.kocialkowski@xxxxxxxxxxx> wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Thanks for this new version! I only have one comment left, see below.
> > > > > > >
> > > > > > > On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote:
> > > > > > > > Documents the protocol that user-space should follow when
> > > > > > > > communicating with stateless video decoders.
> > > > > > > >
> > > > > > > > The stateless video decoding API makes use of the new request and tags
> > > > > > > > APIs. While it has been implemented with the Cedrus driver so far, it
> > > > > > > > should probably still be considered staging for a short while.
> > > > > > > >
> > > > > > > > Signed-off-by: Alexandre Courbot <acourbot@xxxxxxxxxxxx>
> > > > > > > > ---
> > > > > > > > Removing the RFC flag this time. Changes since RFCv3:
> > > > > > > >
> > > > > > > > * Included Tomasz and Hans feedback,
> > > > > > > > * Expanded the decoding section to better describe the use of requests,
> > > > > > > > * Use the tags API.
> > > > > > > >
> > > > > > > > Documentation/media/uapi/v4l/dev-codec.rst | 5 +
> > > > > > > > .../media/uapi/v4l/dev-stateless-decoder.rst | 399 ++++++++++++++++++
> > > > > > > > 2 files changed, 404 insertions(+)
> > > > > > > > create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > > >
> > > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > > > index c61e938bd8dc..3e6a3e883f11 100644
> > > > > > > > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > > > @@ -6,6 +6,11 @@
> > > > > > > > Codec Interface
> > > > > > > > ***************
> > > > > > > >
> > > > > > > > +.. toctree::
> > > > > > > > + :maxdepth: 1
> > > > > > > > +
> > > > > > > > + dev-stateless-decoder
> > > > > > > > +
> > > > > > > > A V4L2 codec can compress, decompress, transform, or otherwise convert
> > > > > > > > video data from one format into another format, in memory. Typically
> > > > > > > > such devices are memory-to-memory devices (i.e. devices with the
> > > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > > > new file mode 100644
> > > > > > > > index 000000000000..7a781c89bd59
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > > > @@ -0,0 +1,399 @@
> > > > > > > > +.. -*- coding: utf-8; mode: rst -*-
> > > > > > > > +
> > > > > > > > +.. _stateless_decoder:
> > > > > > > > +
> > > > > > > > +**************************************************
> > > > > > > > +Memory-to-memory Stateless Video Decoder Interface
> > > > > > > > +**************************************************
> > > > > > > > +
> > > > > > > > +A stateless decoder is a decoder that works without retaining any kind of state
> > > > > > > > +between processing frames. This means that each frame is decoded independently
> > > > > > > > +of any previous and future frames, and that the client is responsible for
> > > > > > > > +maintaining the decoding state and providing it to the decoder with each
> > > > > > > > +decoding request. This is in contrast to the stateful video decoder interface,
> > > > > > > > +where the hardware and driver maintain the decoding state and all the client
> > > > > > > > +has to do is to provide the raw encoded stream.
> > > > > > > > +
> > > > > > > > +This section describes how user-space ("the client") is expected to communicate
> > > > > > > > +with such decoders in order to successfully decode an encoded stream. Compared
> > > > > > > > +to stateful codecs, the decoder/client sequence is simpler, but the cost of
> > > > > > > > +this simplicity is extra complexity in the client which must maintain a
> > > > > > > > +consistent decoding state.
> > > > > > > > +
> > > > > > > > +Stateless decoders make use of the request API and buffer tags. A stateless
> > > > > > > > +decoder must thus expose the following capabilities on its queues when
> > > > > > > > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
> > > > > > > > +
> > > > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
> > > > > > > > + ``OUTPUT`` queue,
> > > > > > > > +
> > > > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
> > > > > > > > + and ``CAPTURE`` queues,
> > > > > > > > +
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > > +Decoding
> > > > > > > > +========
> > > > > > > > +
> > > > > > > > +For each frame, the client is responsible for submitting a request to which the
> > > > > > > > +following is attached:
> > > > > > > > +
> > > > > > > > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > > > > > > > + ``OUTPUT`` queue,
> > > > > > >
> > > > > > > Although this is still the case in the cedrus driver (but will be fixed
> > > > > > > eventually), this requirement should be dropped because metadata is
> > > > > > > per-slice and not per-picture in the formats we're currently aiming to
> > > > > > > support.
> > > > > > >
> > > > > > > I think it would be safer to mention something like filling the output
> > > > > > > buffer with the minimum unit size for the selected output format, to
> > > > > > > which the associated metadata applies.
> > > > > >
> > > > > > I'm not sure it's a good idea. Some of the reasons why I think so:
> > > > > > 1) There are streams that can have even 32 slices. With that, you
> > > > > > instantly run out of V4L2 buffers even just for 1 frame.
> > > > > > 2) The Rockchip hardware which seems to just pick all the slices one
> > > > > > after another and which was the reason to actually put the slice data
> > > > > > in the buffer like that.
> > > > > > 3) Not all the metadata is per-slice. Actually most of the metadata
> > > > > > is per frame and only what is located inside v4l2_h264_slice_param is
> > > > > > per-slice. The corresponding control is an array, which has an entry
> > > > > > for each slice in the buffer. Each entry includes an offset field,
> > > > > > which points to the place in the buffer where the slice is located.
> > > > >
> > > > > Sorry, I realize that my email wasn't very clear. What I meant to say
> > > > > is that the spec should specify that "at least the minimum unit size
> > > > > for decoding should be passed in a buffer" (that's maybe not the
> > > > > clearest wording), instead of "one frame worth of".
> > > > >
> > > > > I certainly don't mean to say that each slice should be held in a
> > > > > separate buffer and totally agree with all the points you're making :)
> > > >
> > > > Thanks for clarifying. I will update the document and post v3 accordingly.
> > > >
> > > > > I just think we should still allow userspace to pass slices with a
> > > > > finer granularity than "all the slices required for one frame".
> > > >
> > > > I'm afraid that doing so could open the door to some ambiguities. If
> > > > you allow that, then are you also allowed to send more than one frame
> > > > if the decode parameters do not change? How do drivers that only
> > > > support full frames react when handled only parts of a frame?
> > >
> > > IIRC the ability to pass individual slices was brought up regarding a
> > > potential latency benefit, but I doubt it would really be that
> > > significant.
> > >
> > > Thinking about it with the points you mentionned in mind, I guess the
> > > downsides are much more significant than the potential gain.
> > >
> > > So let's stick with requiring all the slices for a frame then!
> >
> > Ack.
> >
> > My view is that we can still loosen this requirement in the future,
> > possibly behind some driver capability flag, but starting with a
> > simpler API, with less freedom to the applications and less
> > constraints on hardware support sounds like a better practice in
> > general.
>
> Sounds good, a capability flag would definitely make sense for that.
>
> > > > > Side point: After some discussions with Thierry Reading, who's looking
> > > > > into the the Tegra VPU (also stateless), it seems that using the annex-
> > > > > b format for h.264 would be best for everyone. So that means including
> > > > > the start code, NAL header and "raw" slice data. I guess the same
> > > > > should apply to other codecs too. But that should be in the associated
> > > > > pixfmt spec, not in this general document.
> > > > >
> > > > > What do yout think?
> >
> > Hmm, wouldn't that effectively make it the same as V4L2_PIX_FMT_H264?
>
> Well, this would only concern the slice NAL unit. As far as I
> understood, V4L2_PIX_FMT_H264 takes all sorts of NAL units.
>
Ah, passing only slice NAL units makes much more sense indeed.
> > By the way, I proposed it once, some time ago, but it was rejected
> > because VAAPI didn't get the full annex B stream and a V4L2 stateless
> > VAAPI backend would have to reconstruct the stream.
>
> Oh, right I remember. After a close look, this is apparently not the
> case, according to the VAAPI docs at:
> http://intel.github.io/libva/structVASliceParameterBufferH264.html
>
> Also looking at ffmpeg, VAAPI and VDPAU seem to pass the same data,
> except that VDPAU adds a start code prefix (which IIRC is required by
> the rockchip decoder):
>
> - VAAPI: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/vaapi_h264.c#L331
> - VDPAU: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/vdpau_h264.c#L182
>
> So I was initially a bit reluctant to make it part of the spec that the
> full slice NAL should be passed since that would imply geting the NAL
> header info both in parsed form through the control and in raw form
> along with the slice data. But it looks like it might be rather common
> for decoders to require this if the tegra decoder also needs it.
If so, I think it makes perfect sense indeed.
Best regards,
Tomasz