Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface

From: Paul Kocialkowski
Date: Fri Oct 12 2018 - 07:52:09 EST


Hi,

Le mardi 09 octobre 2018 Ã 16:36 +0900, Tomasz Figa a Ãcrit :
> On Sat, Oct 6, 2018 at 2:09 AM Paul Kocialkowski <contact@xxxxxxxx> wrote:
> > Hi,
> >
> > Le jeudi 04 octobre 2018 Ã 14:10 -0400, Nicolas Dufresne a Ãcrit :
> > > Le jeudi 04 octobre 2018 Ã 14:47 +0200, Paul Kocialkowski a Ãcrit :
> > > > > + Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > > > > + matrix to use when decoding the next queued frame. Applicable to the H.264
> > > > > + stateless decoder.
> > > > > +
> > > > > +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> > > >
> > > > Ditto with "H264_SLICE_PARAMS".
> > > >
> > > > > + Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> > > > > + entries as there are slices in the corresponding ``OUTPUT`` buffer.
> > > > > + Applicable to the H.264 stateless decoder.
> > > > > +
> > > > > +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
> > > > > + Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> > > > > + decoding parameters for a H.264 frame. Applicable to the H.264 stateless
> > > > > + decoder.
> > > >
> > > > Since we require all the macroblocks to decode one frame to be held in
> > > > the same OUTPUT buffer, it probably doesn't make sense to keep
> > > > DECODE_PARAM and SLICE_PARAM distinct.
> > > >
> > > > I would suggest merging both in "SLICE_PARAMS", similarly to what I
> > > > have proposed for H.265: https://patchwork.kernel.org/patch/10578023/
> > > >
> > > > What do you think?
> > >
> > > I don't understand why we add this arbitrary restriction of "all the
> > > macroblocks to decode one frame". The bitstream may contain multiple
> > > NALs per frame (e.g. slices), and stateless API shall pass each NAL
> > > separately imho. The driver can then decide to combine them if needed,
> > > or to keep them seperate. I would expect most decoder to decode each
> > > slice independently from each other, even though they write into the
> > > same frame.
> >
> > Well, we sort of always assumed that there is a 1:1 correspondency
> > between request and output frame when implemeting the software for
> > cedrus, which simplified both userspace and the driver. The approach we
> > have taken is to use one of the slice parameters for the whole series
> > of slices and just append the slice data.
> >
> > Now that you bring it up, I realize this is an unfortunate decision.
> > This may have been the cause of bugs and limitations with our driver
> > because the slice parameters may very well be distinct for each slice.
>
> I might be misunderstanding something, but, at least for the H.264
> API, there is no relation between the number of buffers/requests and
> number of slice parameters. The V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM
> is an array, with each element describing each slice in the OUTPUT
> buffer. So actually, it could be up to the userspace if it want to
> have 1 OUTPUT buffer per slice or all slices in 1 OUTPUT buffer - the
> former would have v4l2_ctrl_h264_decode_param::num_slices = 1 and only
> one valid element in V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS.

It seems that we have totally missed that when implementing H.264
support for Cedrus and did not do anything similar for other codecs.

I think grouping slices in the same OUTPUT buffer and providing offsets
for each slice from each element of an array of SLICE_PARAMS controls
makes a lot of sense.

This way we don't have to have one OUTPUT buffer per slice, which
simplifies things a lot. Also, not having one request per slice will
probably speed things up.

However, I'm a bit confused when looking at the chromeos-4.4 code. It
seems that RK3288 only takes the parameters from the first slice (it's
not using DECODE_PARAM's num_slices). Also, RK3399 doesn't seem to use
the slice params at all. I'm really curious to understand how it works
for Rockchip. Perhaps someone has some insight about this?

Also, I'm not sure we have converged on a solution for the Rockchip VPU
requiring the start code before the coded data. Given that each slice
should be handled separately, does it mean the start code has to be
repeated for each?

> > Moreover, I suppose that just appending the slices data implies that
> > they are coded in the same order as the picture, which is probably
> > often the case but certainly not anything guaranteed.
>
> Again, at least in the H.264 API being proposed here, the order of
> slices is not specified by the order of slice data in the buffer. Each
> entry of the V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS array points to the
> specific offset within the buffer.

Sounds good to me.

Cheers,

Paul

--
Developer of free digital technology and hardware support.

Website: https://www.paulk.fr/
Coding blog: https://code.paulk.fr/
Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/

Attachment: signature.asc
Description: This is a digitally signed message part