Re: [PATCH] media: docs-rst: Document m2m stateless video decoder interface

From: Tomasz Figa
Date: Thu Jan 24 2019 - 03:07:58 EST


On Wed, Jan 23, 2019 at 7:42 PM Paul Kocialkowski
<paul.kocialkowski@xxxxxxxxxxx> wrote:
>
> Hi Alex,
>
> On Wed, 2019-01-23 at 18:43 +0900, Alexandre Courbot wrote:
> > On Tue, Jan 22, 2019 at 7:10 PM Paul Kocialkowski
> > <paul.kocialkowski@xxxxxxxxxxx> wrote:
> > > Hi,
> > >
> > > On Tue, 2019-01-22 at 17:19 +0900, Tomasz Figa wrote:
> > > > Hi Paul,
> > > >
> > > > On Fri, Dec 7, 2018 at 5:30 PM Paul Kocialkowski
> > > > <paul.kocialkowski@xxxxxxxxxxx> wrote:
> > > > > Hi,
> > > > >
> > > > > Thanks for this new version! I only have one comment left, see below.
> > > > >
> > > > > On Wed, 2018-12-05 at 19:01 +0900, Alexandre Courbot wrote:
> > > > > > Documents the protocol that user-space should follow when
> > > > > > communicating with stateless video decoders.
> > > > > >
> > > > > > The stateless video decoding API makes use of the new request and tags
> > > > > > APIs. While it has been implemented with the Cedrus driver so far, it
> > > > > > should probably still be considered staging for a short while.
> > > > > >
> > > > > > Signed-off-by: Alexandre Courbot <acourbot@xxxxxxxxxxxx>
> > > > > > ---
> > > > > > Removing the RFC flag this time. Changes since RFCv3:
> > > > > >
> > > > > > * Included Tomasz and Hans feedback,
> > > > > > * Expanded the decoding section to better describe the use of requests,
> > > > > > * Use the tags API.
> > > > > >
> > > > > > Documentation/media/uapi/v4l/dev-codec.rst | 5 +
> > > > > > .../media/uapi/v4l/dev-stateless-decoder.rst | 399 ++++++++++++++++++
> > > > > > 2 files changed, 404 insertions(+)
> > > > > > create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > >
> > > > > > diff --git a/Documentation/media/uapi/v4l/dev-codec.rst b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > index c61e938bd8dc..3e6a3e883f11 100644
> > > > > > --- a/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > +++ b/Documentation/media/uapi/v4l/dev-codec.rst
> > > > > > @@ -6,6 +6,11 @@
> > > > > > Codec Interface
> > > > > > ***************
> > > > > >
> > > > > > +.. toctree::
> > > > > > + :maxdepth: 1
> > > > > > +
> > > > > > + dev-stateless-decoder
> > > > > > +
> > > > > > A V4L2 codec can compress, decompress, transform, or otherwise convert
> > > > > > video data from one format into another format, in memory. Typically
> > > > > > such devices are memory-to-memory devices (i.e. devices with the
> > > > > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > new file mode 100644
> > > > > > index 000000000000..7a781c89bd59
> > > > > > --- /dev/null
> > > > > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > > > > @@ -0,0 +1,399 @@
> > > > > > +.. -*- coding: utf-8; mode: rst -*-
> > > > > > +
> > > > > > +.. _stateless_decoder:
> > > > > > +
> > > > > > +**************************************************
> > > > > > +Memory-to-memory Stateless Video Decoder Interface
> > > > > > +**************************************************
> > > > > > +
> > > > > > +A stateless decoder is a decoder that works without retaining any kind of state
> > > > > > +between processing frames. This means that each frame is decoded independently
> > > > > > +of any previous and future frames, and that the client is responsible for
> > > > > > +maintaining the decoding state and providing it to the decoder with each
> > > > > > +decoding request. This is in contrast to the stateful video decoder interface,
> > > > > > +where the hardware and driver maintain the decoding state and all the client
> > > > > > +has to do is to provide the raw encoded stream.
> > > > > > +
> > > > > > +This section describes how user-space ("the client") is expected to communicate
> > > > > > +with such decoders in order to successfully decode an encoded stream. Compared
> > > > > > +to stateful codecs, the decoder/client sequence is simpler, but the cost of
> > > > > > +this simplicity is extra complexity in the client which must maintain a
> > > > > > +consistent decoding state.
> > > > > > +
> > > > > > +Stateless decoders make use of the request API and buffer tags. A stateless
> > > > > > +decoder must thus expose the following capabilities on its queues when
> > > > > > +:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked:
> > > > > > +
> > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability must be set on the
> > > > > > + ``OUTPUT`` queue,
> > > > > > +
> > > > > > +* The ``V4L2_BUF_CAP_SUPPORTS_TAGS`` capability must be set on the ``OUTPUT``
> > > > > > + and ``CAPTURE`` queues,
> > > > > > +
> > > > >
> > > > > [...]
> > > > >
> > > > > > +Decoding
> > > > > > +========
> > > > > > +
> > > > > > +For each frame, the client is responsible for submitting a request to which the
> > > > > > +following is attached:
> > > > > > +
> > > > > > +* Exactly one frame worth of encoded data in a buffer submitted to the
> > > > > > + ``OUTPUT`` queue,
> > > > >
> > > > > Although this is still the case in the cedrus driver (but will be fixed
> > > > > eventually), this requirement should be dropped because metadata is
> > > > > per-slice and not per-picture in the formats we're currently aiming to
> > > > > support.
> > > > >
> > > > > I think it would be safer to mention something like filling the output
> > > > > buffer with the minimum unit size for the selected output format, to
> > > > > which the associated metadata applies.
> > > >
> > > > I'm not sure it's a good idea. Some of the reasons why I think so:
> > > > 1) There are streams that can have even 32 slices. With that, you
> > > > instantly run out of V4L2 buffers even just for 1 frame.
> > > > 2) The Rockchip hardware which seems to just pick all the slices one
> > > > after another and which was the reason to actually put the slice data
> > > > in the buffer like that.
> > > > 3) Not all the metadata is per-slice. Actually most of the metadata
> > > > is per frame and only what is located inside v4l2_h264_slice_param is
> > > > per-slice. The corresponding control is an array, which has an entry
> > > > for each slice in the buffer. Each entry includes an offset field,
> > > > which points to the place in the buffer where the slice is located.
> > >
> > > Sorry, I realize that my email wasn't very clear. What I meant to say
> > > is that the spec should specify that "at least the minimum unit size
> > > for decoding should be passed in a buffer" (that's maybe not the
> > > clearest wording), instead of "one frame worth of".
> > >
> > > I certainly don't mean to say that each slice should be held in a
> > > separate buffer and totally agree with all the points you're making :)
> >
> > Thanks for clarifying. I will update the document and post v3 accordingly.
> >
> > > I just think we should still allow userspace to pass slices with a
> > > finer granularity than "all the slices required for one frame".
> >
> > I'm afraid that doing so could open the door to some ambiguities. If
> > you allow that, then are you also allowed to send more than one frame
> > if the decode parameters do not change? How do drivers that only
> > support full frames react when handled only parts of a frame?
>
> IIRC the ability to pass individual slices was brought up regarding a
> potential latency benefit, but I doubt it would really be that
> significant.
>
> Thinking about it with the points you mentionned in mind, I guess the
> downsides are much more significant than the potential gain.
>
> So let's stick with requiring all the slices for a frame then!

Ack.

My view is that we can still loosen this requirement in the future,
possibly behind some driver capability flag, but starting with a
simpler API, with less freedom to the applications and less
constraints on hardware support sounds like a better practice in
general.

>
> Cheers,
>
> Paul
>
> > > However, it looks like supporting this might be a problem for the
> > > rockchip decoder though. Note that our Allwinner VPU can also process
> > > all slices one after the other, but can be configured for slice-level
> > > granularity while decoding (at least it looks that way).
> > >
> > > Side point: After some discussions with Thierry Reading, who's looking
> > > into the the Tegra VPU (also stateless), it seems that using the annex-
> > > b format for h.264 would be best for everyone. So that means including
> > > the start code, NAL header and "raw" slice data. I guess the same
> > > should apply to other codecs too. But that should be in the associated
> > > pixfmt spec, not in this general document.
> > >
> > > What do yout think?

Hmm, wouldn't that effectively make it the same as V4L2_PIX_FMT_H264?

By the way, I proposed it once, some time ago, but it was rejected
because VAAPI didn't get the full annex B stream and a V4L2 stateless
VAAPI backend would have to reconstruct the stream.

Best regards,
Tomasz