Re: [RFC PATCH] media: docs-rst: Document m2m stateless video decoder interface

From: Tomasz Figa
Date: Wed Oct 03 2018 - 06:14:19 EST


On Tue, Sep 18, 2018 at 5:02 PM Alexandre Courbot <acourbot@xxxxxxxxxxxx> wrote:
>
> Hi Hans, sorry for the late reply.
>
> On Tue, Sep 11, 2018 at 6:09 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
> >
> > On 09/11/18 10:40, Alexandre Courbot wrote:
> > > On Mon, Sep 10, 2018 at 9:49 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
> > >>
> > >> On 09/10/2018 01:57 PM, Hans Verkuil wrote:
> > >>> On 09/10/2018 01:25 PM, Hans Verkuil wrote:
> > >>>>> +
> > >>>>> +Decoding
> > >>>>> +========
> > >>>>> +
> > >>>>> +For each frame, the client is responsible for submitting a request to which the
> > >>>>> +following is attached:
> > >>>>> +
> > >>>>> +* Exactly one frame worth of encoded data in a buffer submitted to the
> > >>>>> + ``OUTPUT`` queue,
> > >>>>> +* All the controls relevant to the format being decoded (see below for details).
> > >>>>> +
> > >>>>> +``CAPTURE`` buffers must not be part of the request, but must be queued
> > >>>>> +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
> > >>>>> +decode the frame into it. Although the client has no control over which
> > >>>>> +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
> > >>>>> +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
> > >>>>> +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
> > >>>>> +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
> > >>>>> +
> > >>>>> +If the request is submitted without an ``OUTPUT`` buffer or if one of the
> > >>>>> +required controls are missing, then :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return
> > >>>>> +``-EINVAL``.
> > >>>>
> > >>>> Not entirely true: if buffers are missing, then ENOENT is returned. Missing required
> > >>>> controls or more than one OUTPUT buffer will result in EINVAL. This per the latest
> > >>>> Request API changes.
> > >>>>
> > >>>> Decoding errors are signaled by the ``CAPTURE`` buffers being
> > >>>>> +dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag.
> > >>>>
> > >>>> Add here that if the reference frame had an error, then all other frames that refer
> > >>>> to it should also set the ERROR flag. It is up to userspace to decide whether or
> > >>>> not to drop them (part of the frame might still be valid).
> > >>>>
> > >>>> I am not sure whether this should be documented, but there are some additional
> > >>>> restrictions w.r.t. reference frames:
> > >>>>
> > >>>> Since decoders need access to the decoded reference frames there are some corner
> > >>>> cases that need to be checked:
> > >>>>
> > >>>> 1) V4L2_MEMORY_USERPTR cannot be used for the capture queue: the driver does not
> > >>>> know when a malloced but dequeued buffer is freed, so the reference frame
> > >>>> could suddenly be gone.
> > >>>>
> > >>>> 2) V4L2_MEMORY_DMABUF can be used, but drivers should check that the dma buffer is
> > >>>> still available AND increase the dmabuf refcount while it is used by the HW.
> > >>>>
> > >>>> 3) What to do if userspace has requeued a buffer containing a reference frame,
> > >>>> and you want to decode a B/P-frame that refers to that buffer? We need to
> > >>>> check against that: I think that when you queue a capture buffer whose index
> > >>>> is used in a pending request as a reference frame, than that should fail with
> > >>>> an error. And trying to queue a request referring to a buffer that has been
> > >>>> requeued should also fail.
> > >>>>
> > >>>> We might need to add some support for this in v4l2-mem2mem.c or vb2.
> > >>>>
> > >>>> We will have similar (but not quite identical) issues with stateless encoders.
> > >>>
> > >>> Related to this is the question whether buffer indices that are used to refer
> > >>> to reference frames should refer to the capture or output queue.
> > >>>
> > >>> Using capture indices works if you never queue more than one request at a time:
> > >>> you know exactly what the capture buffer index is of the decoded I-frame, and
> > >>> you can use that in the following requests.
> > >>>
> > >>> But if multiple requests are queued, then you don't necessarily know to which
> > >>> capture buffer an I-frame will be decoded, so then you can't provide this index
> > >>> to following B/P-frames. This puts restrictions on userspace: you can only
> > >>> queue B/P-frames once you have decoded the I-frame. This might be perfectly
> > >>> acceptable, though.
> > >
> > > IIUC at the moment we are indeed using CAPTURE buffer indexes, e.g:
> > >
> > > .. flat-table:: struct v4l2_ctrl_mpeg2_slice_params
> > > ..
> > > - ``backward_ref_index``
> > > - Index for the V4L2 buffer to use as backward reference, used
> > > with
> > > B-coded and P-coded frames.
> > >
> > > So I wonder how is the user-space currently exercising Cedrus doing
> > > here? Waiting for each frame used as a reference to be dequeued?
> >
> > No, the assumption is (if I understand correctly) that userspace won't
> > touch the memory of the dequeued reference buffer so HW can just point
> > to it.
> >
> > Paul, please correct me if I am wrong.
> >
> > What does chromeOS do?
>
> At the moment Chrome OS (using the config store) queues the OUTPUT and
> CAPTURE buffers together, i.e. in the same request. The CAPTURE buffer
> is not tied to the request in any way, but what seems to matter here
> is the queue order. If drivers process CAPTURE drivers sequentially,
> then you can know which CAPTURE buffer will be used for the request.
>
> The corollary of that is that CAPTURE buffers cannot be re-queued
> until they are not referenced anymore, something the Chrome OS
> userspace also takes care of. Would it be a problem to make this the
> default expectation instead of having the kernel check and reorder
> CAPTURE buffers? The worst that can happen AFAICT is is frame
> corruption, and processing queued CAPTURE buffers sequentially would
> allow us to use the V4L2 buffer ID to reference frames. That's still
> the most intuitive way to do, using relative frame indexes (i.e. X
> frames ago) adds complexity and the potential for misuse and bugs.

+1

The stateless API delegates the reference frame management to the
client and I don't see why we should be protecting the client from
itself. In particular, as I suggested in another email, there can be
valid cases where requeuing CAPTURE buffers while still on the
reference list is okay.

Best regards,
Tomasz