Re: [PATCH V2] [media] v4l2: Add AV1 pixel format

From: Hsia-Jun Li
Date: Thu Dec 15 2022 - 21:56:10 EST




On 12/16/22 04:18, Nicolas Dufresne wrote:
CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.


Le jeudi 08 décembre 2022 à 10:39 +0800, Hsia-Jun Li a écrit :

On 12/8/22 01:16, Nicolas Dufresne wrote:
CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.


Le mercredi 07 décembre 2022 à 15:18 +0800, Hsia-Jun Li a écrit :

On 12/7/22 02:03, Nicolas Dufresne wrote:
CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.


Le mardi 29 novembre 2022 à 18:32 +0800, Hsia-Jun Li a écrit :
Hello

I think we need to add an extra event for VP9 and AV1 which support
frame scaling, which means its frame width and height could be different
to the previous frame or reference frame.

That would be more possible for the VP9 as there is not a sequence
header for VP9.

The solution is unlikely in the form of an event, but yes, to complete VP9
support (and improve AV1 support) a mechanism need to be designed and specified
to handle inter-frame resolution changes.

Why I say improve AV1, this is because VP9 bitstream does not signal SVC spatial
streams (the most common use of inter-frame resolution changes). With SVC
streams, the smaller images are alway decode-only (never displayed). This can be

We expect to get all the result from different layers of a SVC stream.
Which layer would be displayed is a user's decision.

This is off reality. Its the conferencing server that measure each participant
bandwidth and decide how many layers each one should get to avoid overloading
I think the major would prefer the P2P mode and forward way. Server negotiate and re-encoding were still in use but the lag is people don't want.
the network. The selection happens in compressed domain. Perhaps someone could
be creative and make-up a use case for what you describe, but this isn't used in
practice.
I know at least two commerical softwares support this.

Just consider this simple, not SVC anneb. Do VP9 and AV1 could support the current frame has less resolution and worst quaility than the reference frame? I think the answer would be certainly yes. That is the idea of golden frame.
So "the smaller images are alway decode-only (never displayed)" is not right.
Only the inverted case is a pratice of SVC.

Adding multi-resolution output requires a massive API additions in
stateful decoders (which this hidden RFC does not cover).
yes, had better to start a new topic.


1. golden frame usually would be higher resolution or better quality.
But we would only display frames which are lower resolution.

2. Higher resolution or quality layer would have longer interval, user
may just display a lower layer in real time case (like video conference).

at least partially supported as long as the maximum image dimension is signalled
by the bitstream. This is the case for AV1, but not VP9.

Stateless decoders are not affected, because userspace is aware of frames being
decoded, but not displayed. It is also aware that these frames are reference
frames. While on stateless decoder, userspace usually does not have this
knowledge. I think one way to solve this, would be for drivers to be able to
mark a buffer done, with a flag telling userspace that its not to be displayed.
For the SVC case, the dimensions and stride are irrelevant.

For true inter-resolution changes, like VP9 supports (though rarely used), this
needs more APIs. It was suggested to extend CREATE_BUFS, which allow allocation
with different FMT, with a DELETE_BUFS ioctl, so that userspace can smoothly
handle the allocation transition.
This could only solve the problem of never display graphics buffers
likes golden frame or alternative reference frame.

About the topic timestamp tracking problem in v4l2, maybe we could start
a new thread or move them to Gstreamer.
https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.freedesktop.org_gstreamer_gstreamer_-2D_issues_1619&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=GL3Q2_6ERpT94we1vI-sUNajRV_t3-lcd8F6DAbXt5EimufjPEa-yTi1-p3EhsCM&s=KbTA0L42trYmxvVphaiUeDsgUS4e-vm64epfaSgAYH8&e=

My idea here is event attached to buffer or just using the new request
supported in CAPTURE side. I know you worry about the v4l2 event, it is
out of band, more event could lead to the problem we suffer from
OpenMAX. If we could have an order between event and buffer, it won't be
a problem.
For VP9 also, it might be required to support
super-frame, VP9 supper frames are the ancestor of AV1 TU, and only the last
frame of a super-frame is every to be displayed. A newly introduced AV1 format
might also requires complete TU, rather then frames, this needs strict
documentation.
I don't think the temporal unit is a good idea here.
Most of hardware could only decode a frame once or less likes a
tile(likes slice in ITU codecs).

Considering the MPEG-TS case,
https://urldefense.proofpoint.com/v2/url?u=https-3A__aomediacodec.github.io_av1-2Dmpeg2-2Dts_&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=GL3Q2_6ERpT94we1vI-sUNajRV_t3-lcd8F6DAbXt5EimufjPEa-yTi1-p3EhsCM&s=f7qIYhe94ooCIv1awccCfSmI-Gq0raAXHRogkCBTB4M&e=
Decodable Frame Group could be more a better idea.
Temporal Unit would lead to larger delay.

This is off topic for the tread, but this one could be fixed by setting a flag
on the capture buffer, something like:

V4L2_BUF_FLAG_DECODE_ONLY

That's similar to how it works with other CODEC API. The down side is that the
driver needs to remember if this is a reference frame when userspace queue back
that decode-only frame, so its not overwritten. Userspace is not aware of the
reference state, hence can't be made responsible. I suspect a lot of the drivers
out there uses secondary buffer, meaning the reference are not the CAPTURE
buffer.
Drivers certainly could allocate its internal buffers. But I believe
Android won't like this idea. They would like you allocate it from
somewhere then import into driver.

Besides, when you decode a secure(DRM) stream, you won't want to leak
any data from it. While for those normal stream, you don't want occupt
that limitted amount secure memory zone. I would like to let the
userspace control the allocation of those internal buffers.

This use case needs to be thought thought too. Perhaps other driver uses
internally allocated memory whenever its about to produce a decode only, but
that seems to require some firmware feature that is likely uncommon. Please,
make your research, compare various drivers, and propose an API in the form of
an RFC so we can discuss that independently from this AV1 pixel format thread.
My proposal for solving tracking the timestamp issue is making v4l2
event have order relevant to buffer.

It would come after I refresh the v4l2 pix format extend API.


Decoding frames would mean that un-display and frame of different
sizes get delivered, and we don't have a method to communicate these frame
dimension and strides at the moment.

Nicolas




On 9/12/22 23:45, Nicolas Dufresne wrote:
Hi Shi,

thanks for the patches, check inline for some comments. Generally speaking, we
don't usually add formats ahead of time unless we have a good rationale to do
so. Should be expect a companion series against the amlogic decoder driver that
enables this ?

Le mardi 30 août 2022 à 09:40 +0800, Shi Hao a écrit :
From: "hao.shi" <hao.shi@xxxxxxxxxxx>

Add AV1 compressed pixel format. It is the more common format.

Signed-off-by: Hao Shi <hao.shi@xxxxxxxxxxx>
---
.../userspace-api/media/v4l/pixfmt-compressed.rst | 9 +++++++++
drivers/media/v4l2-core/v4l2-ioctl.c | 1 +
include/uapi/linux/videodev2.h | 1 +
3 files changed, 11 insertions(+)

diff --git a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
index 506dd3c98884..5bdeeebdf9f5 100644
--- a/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
+++ b/Documentation/userspace-api/media/v4l/pixfmt-compressed.rst
@@ -232,6 +232,15 @@ Compressed Formats
Metadata associated with the frame to decode is required to be passed
through the ``V4L2_CID_STATELESS_FWHT_PARAMS`` control.
See the :ref:`associated Codec Control ID <codec-stateless-fwht>`.
+ * .. _V4L2-PIX-FMT-AV1:
+
+ - ``V4L2_PIX_FMT_AV1``
+ - 'AV1'
+ - AV1 Access Unit. The decoder expects one Access Unit per buffer.

I believe this is using a MPEG LA terminology. Did you mean a Temporal Unit (TU)
? In AV1 a TU represent 1 displayable picture, just like AU in H.264 (if you
ignore interlaced video).
I think it should be a complete tile group obu. From the spec, we have
the term 'frame'.

Currently, AV1 doesn't support interlace.

+ The encoder generates one Access Unit per buffer. This format is
+ adapted for stateful video decoders. AV1 (AOMedia Video 1) is an
+ open video coding format. It was developed as a successor to VP9
+ by the Alliance for Open Media (AOMedia).

.. raw:: latex

diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index c314025d977e..fc0f43228546 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1497,6 +1497,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
case V4L2_PIX_FMT_MT21C: descr = "Mediatek Compressed Format"; break;
case V4L2_PIX_FMT_QC08C: descr = "QCOM Compressed 8-bit Format"; break;
case V4L2_PIX_FMT_QC10C: descr = "QCOM Compressed 10-bit Format"; break;
+ case V4L2_PIX_FMT_AV1: descr = "AV1"; break;
default:
if (fmt->description[0])
return;
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 01e630f2ec78..c5ea9f38d807 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -738,6 +738,7 @@ struct v4l2_pix_format {
#define V4L2_PIX_FMT_FWHT_STATELESS v4l2_fourcc('S', 'F', 'W', 'H') /* Stateless FWHT (vicodec) */
#define V4L2_PIX_FMT_H264_SLICE v4l2_fourcc('S', '2', '6', '4') /* H264 parsed slices */
#define V4L2_PIX_FMT_HEVC_SLICE v4l2_fourcc('S', '2', '6', '5') /* HEVC parsed slices */
+#define V4L2_PIX_FMT_AV1 v4l2_fourcc('A', 'V', '1', '0') /* AV1 */

/* Vendor-specific formats */
#define V4L2_PIX_FMT_CPIA1 v4l2_fourcc('C', 'P', 'I', 'A') /* cpia1 YUV */

base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868









--
Hsia-Jun(Randy) Li