RE: [EXT] Re: [PATCH v8 04/15] media:Add v4l2 event codec_error and skip

From: Ming Qian
Date: Thu Sep 09 2021 - 21:50:42 EST


> -----Original Message-----
> From: Nicolas Dufresne [mailto:nicolas@xxxxxxxxxxxx]
> Sent: Friday, September 10, 2021 3:54 AM
> To: Ming Qian <ming.qian@xxxxxxx>; mchehab@xxxxxxxxxx;
> shawnguo@xxxxxxxxxx; robh+dt@xxxxxxxxxx; s.hauer@xxxxxxxxxxxxxx
> Cc: hverkuil-cisco@xxxxxxxxx; kernel@xxxxxxxxxxxxxx; festevam@xxxxxxxxx;
> dl-linux-imx <linux-imx@xxxxxxx>; Aisheng Dong <aisheng.dong@xxxxxxx>;
> linux-media@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> devicetree@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [EXT] Re: [PATCH v8 04/15] media:Add v4l2 event codec_error and
> skip
>
> Caution: EXT Email
>
> Le jeudi 09 septembre 2021 à 03:13 +0000, Ming Qian a écrit :
> > > -----Original Message-----
> > > From: Nicolas Dufresne [mailto:nicolas@xxxxxxxxxxxx]
> > > Sent: Wednesday, September 8, 2021 9:33 PM
> > > To: Ming Qian <ming.qian@xxxxxxx>; mchehab@xxxxxxxxxx;
> > > shawnguo@xxxxxxxxxx; robh+dt@xxxxxxxxxx; s.hauer@xxxxxxxxxxxxxx
> > > Cc: hverkuil-cisco@xxxxxxxxx; kernel@xxxxxxxxxxxxxx;
> > > festevam@xxxxxxxxx; dl-linux-imx <linux-imx@xxxxxxx>; Aisheng Dong
> > > <aisheng.dong@xxxxxxx>; linux-media@xxxxxxxxxxxxxxx;
> > > linux-kernel@xxxxxxxxxxxxxxx; devicetree@xxxxxxxxxxxxxxx;
> > > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> > > Subject: [EXT] Re: [PATCH v8 04/15] media:Add v4l2 event codec_error
> > > and skip
> > >
> > > Caution: EXT Email
> > >
> > > Hi Ming,
> > >
> > > more API only review.
> > >
> > > Le mardi 07 septembre 2021 à 17:49 +0800, Ming Qian a écrit :
> > > > The codec_error event can tell client that there are some error
> > > > occurs in the decoder engine.
> > > >
> > > > The skip event can tell the client that there are a frame has been
> > > > decoded, but it won't be outputed.
> > > >
> > > > Signed-off-by: Ming Qian <ming.qian@xxxxxxx>
> > > > Signed-off-by: Shijie Qin <shijie.qin@xxxxxxx>
> > > > Signed-off-by: Zhou Peng <eagle.zhou@xxxxxxx>
> > > > ---
> > > > .../userspace-api/media/v4l/vidioc-dqevent.rst | 12
> ++++++++++++
> > > > include/uapi/linux/videodev2.h | 2 ++
> > > > 2 files changed, 14 insertions(+)
> > > >
> > > > diff --git
> > > > a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > > b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > > index 6eb40073c906..87d40ad25604 100644
> > > > --- a/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > > +++ b/Documentation/userspace-api/media/v4l/vidioc-dqevent.rst
> > > > @@ -182,6 +182,18 @@ call.
> > > > the regions changes. This event has a struct
> > > > :c:type:`v4l2_event_motion_det`
> > > > associated with it.
> > > > + * - ``V4L2_EVENT_CODEC_ERROR``
> > > > + - 7
> > > > + - This event is triggered when some error occurs inside the
> > > > + codec
> > > engine,
> > > > + usually it can be replaced by a POLLERR event, but in some
> > > > + cases,
> > > > the
> > > POLLERR
> > > > + may cause the application to exit, but this event can allow
> > > > + the
> > > application to
> > > > + handle the codec error without exiting.
> > >
> > > Events are sent to userspace in a separate queue from the VB2 queue.
> > > Which means it's impossible for userspace to know where this error
> > > actually took place.
> > > Userspace may endup discarding valid frames from the VB queue, as it
> > > does not know which one are good, and which one are bad.
> > >
> > > There is likely a bit of spec work to be done here for non-fatal
> > > decode errors, but I think the right approach is to use
> > > V4L2_BUF_FLAG_ERROR. What we expect from decoders is that for each
> > > frame, a CAPTURE buffer is assigned.
> > > If
> > > decoding that frame was not possible but the error is recoverable
> > > (corrupted bitstream, missing reference, etc.), then the failing
> > > frame get marked with FLAG_ERROR and decoding continues as usual.
> > >
> > > What isn't documented is that you can set bytesused to 0, meaning
> > > there is nothing useful in that frame, or a valid bytesused when you
> > > know only some blocks are broken (e.g. missing 1 ref). Though,
> > > GStreamer might be the only implementation of that, and byteused 0 may
> confuse some existing userspace.
> > >
> >
> > Hi Nicolas,
> > We don't use this event to tell userspace which frame is broken.
> > Actually it tries to tell userspace that the decoder is abnormal and
> > there will be no more frames output. The usersapce shouldn't wait, it
> > can reset the decoder instance if it wants to continue decoding more
> > frames, or it can exit directly.
> > Usually there will no capture buffer can be dequeued, so we can't
> > set a V4L2_BUF_FLAG_ERROR flag to a capture buffer.
>
> That is not logical, if userspace asked to decode a buffer, but didn't queue back
> any CAPTURE buffer, you are expected to just sit there and wait.
>
> > In my opinion, setting bytesused to 0 means eos, and as you say,
> > it may confuse some existing userspace.
>
> Byteused 0 only mean EOS for one specific driver, MFC. That behaviour was
> kept to avoid breaking existing userspace. In fact, you have to opt in, the
> framework will prevent you from using it for that purpose.
>
> > I think it can be replaced by POLLERR in most of case, but we meet
> > some applications who prefer to use this event instead of pollerr
>
> In general, recoverable errors should be handled without the need for
> userspace to reset. This looks more like you have a bug in your error handling
> and deffer it to userspace. Most userspace will just abort and report to users, I
> doubt this is really what you expect.
>
> What matters for recoverable errors is that you keep consuming OUTPUT
> buffers.
> And userspace should be happy with never getting anything from the CAPTURE
> till the propblem was recovered by the driver. Of course, userspace should
> probably garbage collect the metadata it might be holding, chromium does
> that with a leaky queue of 16 metadata buffer notably
>
> My recommandation would be to drop this for now, and just try to not stall on
> errors (or make it a hard failure for now, pollerr, or ioctl errors).
>

OK, I agree with you, I will drop it

> >
> >
> > > > + * - ``V4L2_EVENT_SKIP``
> > > > + - 8
> > > > + - This event is triggered when one frame is decoded, but it
> > > > + won't
> > > > be
> > > outputed
> > > > + to the display. So the application can't get this frame, and
> > > > + the
> > > > input
> > > frame count
> > > > + is dismatch with the output frame count. And this evevt is
> > > > + telling
> > > > the
> > > client to
> > > > + handle this case.
> > >
> > > Similar to my previous comment, this event is flawed, since
> > > userspace cannot know were the skip is located in the queued
> > > buffers. Currently, all decoders are mandated to support
> > > V4L2_BUF_FLAG_TIMESTAMP_COPY. The timestamp must NOT be
> interpreted
> > > by the driver and must be reproduce as-is in the associated CAPTURE
> > > buffer. It is possible to "garbage" collect skipped frames with this
> > > method, though tedious.
> > >
> > > An alternative, and I think it would be much nicer then this, would
> > > be to use the v4l2_buffer.sequence counter, and just make it skip 1
> > > on skips. Though, the down side is that userspace must also know how
> > > to reorder frames (a driver job for stateless codecs) in order to
> > > identify which frame was skipped. So this is perhaps not that
> > > useful, other then knowing something was skipped in the past.
> > >
> > > A third option would be to introduce V4L2_BUF_FLAG_SKIPPED. This way
> > > the driver could return an empty payload (bytesused = 0) buffer with
> > > this flag set, and the proper timestamp properly copied. This would
> > > let the driver communicate skipped frames in real-time. Note that
> > > this could break with existing userspace, so it would need to be
> > > opted-in somehow (a control or some flags).
> >
> > Hi Nicolas,
> > The problem we meet is that userspace doesn't care which frame is
> > skipped, it just need to know that there are a frame is skipped, the
> > driver should promise the input frame count is equals to the output frame
> count.
> > Your first method is possible in theory, but we find the timestamp
> > may be unreliable, we meet many timestamp issues that userspace may
> > enqueue invalid timestamp or repeated timestamp and so on, so we can't
> accept this solution.
>
> The driver should not interpret the provided timestamp, so it should not be
> able to say if the timestamp is valid or not, this is not the driver's task.
>
> The driver task is to match the timestamp to the CAPTURE buffer (if that buffer
> was produced), and reproduce it exactly.
>
> > I think your second option is better. And there are only 1
> > question, we find some application prefer to use the V4L2_EVENT_EOS to
> > check the eos, not checking the empty buffer, if we use this method to
> > check skipped frame, the
>
> Checking the empty buffer is a legacy method, only available in Samsung MFC
> driver. The spec says that the last buffer should be flagged with _LAST, and any
> further attempt to poll should unblock and DQBUF return EPIPE.
>
> > application should check empty buffer instead of V4L2_EVENT_EOS,
> > otherwise if the last frame is skipped, the application will miss it.
> > Of course this is not a problem, it just increases the complexity of
> > the userspace implementation
>
> The EPIPE mechanism covers this issue, which we initially had with the LAST
> flag.
>
> > I don't think your third method is feasible, the reasons are as below
> > 1. usually the empty payload means eos, and as you say,
> > it may introduce confusion.
> > 2. The driver may not have the opportunity to return an empty
> > payload during decoding, in our driver, driver will pass the capture
> > buffer to firmware, and when some frame is skipped, the firmware won't
> > return the buffer, driver may not find an available capture buffer to
> > return to userspace.
> >
> > The requirement is that userspace need to match the input frame
> > count and output frame count. It doesn't care which frame is skipped,
> > so the V4L2_EVENT_SKIP is the easiest way for driver and userspace.
> > If you think this event is really inappropriate, I prefer to adopt
> > your second option
>
> Please, drop SKIP from you driver and this patchset and fix your draining
> process handling to follow the spec. The Samsung OMX component is
> irrelevant to mainline submission, the OMX code should be updated to follow
> the spec.
>

OK, I'll drop it, and follow your second option that use the v4l2_buffer.sequence counter.

My previous statement about empty buffer was not appropriate.
I know the last buffer should be flagged with _LAST, and in most of case,
driver will append a empty buffer with _LAST flag.
But in userspace, I find some application will check the bytesused, if it's 0, the application will think it's eos
I checked the gstreamer v4l2 decoder, I found the following code:
/* Legacy M2M devices return empty buffer when drained */
if (size == 0 && GST_V4L2_IS_M2M (obj->device_caps))
goto eos;
and I found the ffmpeg v4l2 decoder does the similar thing.

So I don't want to use empty buffer except the last buffer in the driver.

> >
> > >
> > > > * - ``V4L2_EVENT_PRIVATE_START``
> > > > - 0x08000000
> > > > - Base event number for driver-private events.
> > > > diff --git a/include/uapi/linux/videodev2.h
> > > > b/include/uapi/linux/videodev2.h index 5bb0682b4a23..c56640d42dc5
> > > > 100644
> > > > --- a/include/uapi/linux/videodev2.h
> > > > +++ b/include/uapi/linux/videodev2.h
> > > > @@ -2369,6 +2369,8 @@ struct v4l2_streamparm {
> > > > #define V4L2_EVENT_FRAME_SYNC 4
> > > > #define V4L2_EVENT_SOURCE_CHANGE 5
> > > > #define V4L2_EVENT_MOTION_DET 6
> > > > +#define V4L2_EVENT_CODEC_ERROR 7
> > > > +#define V4L2_EVENT_SKIP 8
> > > > #define V4L2_EVENT_PRIVATE_START 0x08000000
> > > >
> > > > /* Payload for V4L2_EVENT_VSYNC */
> > >
> >
>