Re: [RFC] Stateless codecs: how to refer to reference frames

From: Paul Kocialkowski
Date: Wed Oct 24 2018 - 06:43:58 EST


Hi,

Le vendredi 19 octobre 2018 Ã 11:40 +0200, Hans Verkuil a Ãcrit :
> From Alexandre's '[RFC PATCH v3] media: docs-rst: Document m2m stateless
> video decoder interface':
>
> On 10/19/18 10:09, Alexandre Courbot wrote:
> > Two points being currently discussed have not been changed in this
> > revision due to lack of better idea. Of course this is open to change:
>
> <snip>
>
> > * The other hot topic is the use of capture buffer indexes in order to
> > reference frames. I understand the concerns, but I doesn't seem like
> > we have come with a better proposal so far - and since capture buffers
> > are essentially well, frames, using their buffer index to directly
> > reference them doesn't sound too inappropriate to me. There is also
> > the restriction that drivers must return capture buffers in queue
> > order. Do we have any concrete example where this scenario would not
> > work?
>
> I'll stick to decoders in describing the issue. Stateless encoders probably
> do not have this issue.
>
> To recap: the application provides a buffer with compressed data to the
> decoder. After the request is finished the application can dequeue the
> decompressed frame from the capture queue.
>
> In order to decompress the decoder needs to access previously decoded
> reference frames. The request passed to the decoder contained state
> information containing the buffer index (or indices) of capture buffers
> that contain the reference frame(s).
>
> This approach puts restrictions on the framework and the application:
>
> 1) It assumes that the application can predict the capture indices.
> This works as long as there is a simple relationship between the
> buffer passed to the decoder and the buffer you get back.
>
> But that may not be true for future codecs. And what if one buffer
> produces multiple capture buffers? (E.g. if you want to get back
> decompressed slices instead of full frames to reduce output latency).

I don't really understand how there could be multiple capture buffers
for the same frame used as reference here. I believe the stateless VPU
API should make it a hard requirement that reference frames must be
fully held in a single capture buffer. This is because the hardware
will generally expect one address for the reference frame (that's
definitely the case for Cedrus: we cannot deal with reference frames if
each reference is scattered accross multiple buffers).

So I don't believe it's a problem to associate a reference frame and a
single capture buffer for the stateless VPU case. Rather that it should
be specified as a requriement of the API.

I don't see any problem with allowing multiple output buffers though
(that should then be rendered to the same capture buffer), so the 1:1
relationship between output and capture cannot be assumed either way.

> This API should be designed to be future-proof (within reason of course),
> and I am not at all convinced that future codecs will be just as easy
> to predict.
>
> 2) It assumes that neither drivers nor applications mess with the buffers.
> One case that might happen today is if the DMA fails and a buffer is
> returned marked ERROR and the DMA is retried with the next buffer. There
> is nothing in the spec that prevents you from doing that, but it will mess
> up the capture index numbering. And does the application always know in
> what order capture buffers are queued? Perhaps there are two threads: one
> queueing buffers with compressed data, and the other dequeueing the
> decompressed buffers, and they are running mostly independently.
>
> I believe that assuming that you can always predict the indices of the
> capture queue is dangerous and asking for problems in the future.

I agree, assuming that userspace can predict the matching capture
buffer index seems very fragile.

> I am very much in favor of using a dedicated cookie. The application sets
> it for the compressed buffer and the driver copies it to the uncompressed
> capture buffer. It keeps track of the association between capture index
> and cookie. If a compressed buffer decompresses into multiple capture
> buffers, then they will all be associated with the same cookie, so
> that simplifies how you refer to reference frames if they are split
> over multiple buffers.

This seems like a great idea, I'm all for it!

> The codec controls refer to reference frames by cookie(s).
>
> For existing applications that use the capture index all you need to do
> is to set the capture index as the cookie value in the output buffer.
>
> It is my understanding that ChromeOS was using the timestamp as the
> cookie value.
>
> I have thought about that, but I am not in favor of doing that. One
> reason is that struct timeval comes in various flavors (32 bit, 64 bit,
> and a y2038-compatible 32-bit type in the future).
>
> The y2038 compat code that we will need concerns me in particular since
> it will mean that the timeval is converted from 32 to 64 bit and back
> again, and that might well mangle the data. I'm not so sure if you can
> stick a 64-bit pointer in the timeval (e.g. the high 32 bits go to into
> the tv_sec field, the low 32 bits go to the usecs). The y2038 conversion
> might mangle the tv_usec field (e.g. divide by 1000000 and add the seconds
> to the tv_sec field).
>
> I would really prefer an independent 64-bit cookie value that the application
> can set instead of abusing something else.
>
> I propose to make a union with v4l2_timecode (which nobody uses) and a
> new V4L2_BUF_FLAG_COOKIE flag.
>
> struct v4l2_buffer_cookie {
> __u32 high;
> __u32 low;
> };
>
> And in v4l2_buffer:
>
> union {
> struct v4l2_timecode timecode;
> struct v4l2_buffer_cookie cookie;
> };
>
> And static inlines:
>
> void v4l2_buffer_set_cookie(struct v4l2_buffer *buf, __u64 cookie)
> {
> buf->cookie.high = cookie >> 32;
> buf->cookie.low = cookie & 0xffffffffULL;
> buf->flags |= V4L2_BUF_FLAG_COOKIE;
> }
>
> void v4l2_buffer_set_cookie_ptr(struct v4l2_buffer *buf, void *cookie)
> {
> v4l2_buffer_set_cookie(buf, (__u64)cookie);
> }
>
> __u64 v4l2_buffer_get_cookie(struct v4l2_buffer *buf)
> {
> if (!(buf->flags & V4L2_BUF_FLAG_COOKIE))
> return 0;
> return (((__u64)buf->cookie.high) << 32) | (__u64)buf->cookie.low;
> }
>
> void *v4l2_buffer_get_cookie_ptr(struct v4l2_buffer *buf)
> {
> return (void *)v4l2_buffer_get_cookie(buf);
> }
>
> Why not just use __u64? Because the alignment in v4l2_buffer is a nightmare.
> Using __u64 would create holes, made even worse by different struct timeval
> sizes depending on the architecture.
>
> I'm proposing a struct v4l2_ext_buffer together with new streaming ioctls
> during the media summit that has a clean layout and there this can be just
> a __u64.
>
> I'm calling it a 'cookie' here, but that's just a suggestion. Better
> names are welcome.

Cheers,

Paul

--
Developer of free digital technology and hardware support.

Website: https://www.paulk.fr/
Coding blog: https://code.paulk.fr/
Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/

Attachment: signature.asc
Description: This is a digitally signed message part