Re: [PATCH 0/7] Raspberry Pi HEVC decoder driver

From: Nicolas Dufresne
Date: Wed Jan 08 2025 - 13:40:18 EST


Le mardi 07 janvier 2025 à 16:13 +0000, Dave Stevenson a écrit :
> Hi Nicolas
>
> On Mon, 6 Jan 2025 at 20:46, Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
> >
> > Hi Dave,
> >
> > Le vendredi 20 décembre 2024 à 16:21 +0000, Dave Stevenson a écrit :
> > > Hi All
> > >
> > > This has been in the pipeline for a while, but I've finally cleaned
> > > up our HEVC decoder driver to be in a shape to at least get a first
> > > review.
> > > John Cox has done almost all of the work under contract to Raspberry
> > > Pi, and I'm largely just doing the process of patch curation and
> > > sending.
> > >
> > > There are a couple of questions raised in frameworks.
> > > The main one is that the codec has 2 independent phases to the decode,
> > > CABAC and reconstruction. To keep the decoder operating optimally
> > > means that two requests need to be in process at once, whilst the
> > > current frameworks don't want to allow as there is an implicit
> > > assumption of only a single job being active at once, and
> > > completition returns both buffers and releases the media request.
> > >
> > > The OUTPUT queue buffer is finished with and can be returned at the
> > > end of phase 1, but the media request is still required for phase 2.
> > > The frameworks currently force the driver to be returning both
> > > together via v4l2_m2m_buf_done_and_job_finish. v4l2_m2m_job_finish
> > > would complete the job without returning the buffer as we need,
> > > however if the driver has set VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF
> > > then we have a WARN in v4l2_m2m_job_finish.
> > > Dropping the WARN as this series is currently doing isn't going to be
> > > the right answer, but it isn't obvious what the right answer is.
> > > Discussion required.
> >
> > I think part of the manual request completion RFC will be to evaluate the impact
> > on VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF feature. MTK does not support
> > interleaved interlaced decoding (only alternate), so they didn't have to
> > implement that feature.
> >
> > Overall, It would be nice to get your feedback on the new manual request
> > proposal, which is I believe better then the pin/unpin API you have in this
> > serie.
>
> I wasn't aware of that series, but I / John will take a look.
>
> > >
> > > We also have a need to hold on to the media request for phase 2. John
> > > had discussed this with Ezequiel (and others) a couple of years back,
> > > and hence suggested a patch that adds media_request_{pin,unpin} to
> > > grab references on the media request. Discussion required on that
> > > or a better way of handling it.
> > >
> > > I will apologise in advance for sending this V1 just before I head off
> > > on the Christmas break, but will respond to things as soon as possible.
> >
> > One thing missing in this summary is how this driver is being validated
> > (specially that for this one requires a downstream fork of FFMPEG). To this
> > report we ask for:
> >
> > - v4l2-compliance results
> > - Fluster conformance tests results [1] and I believe you need [2]
> >
> > [1] https://github.com/fluendo/fluster
> > [2] https://github.com/fluendo/fluster/pull/179
>
> Sure, I'll sort that before doing a V2.
>
> > GStreamer support is there in main now, but without the needed software video
> > converter for you column tiling, we can't use it for that (i.e. only works
> > through GL or Wayland).
>
> Can you point me at the right place for the software converter?
> It's a relatively trivial reformat required to get it back into NV12 /
> I420 or 10bit equivalents, so happy to do that. I think John already
> has NEON optimised code if desired.

Its challenging in the way its implemented in GStreamer Video library. Basically
all formats needs its GstVideoFormatInfo structure to be filled in the static
table, and then minimally a slow path for color conversion, which is reduced to
pack/unpack of a single line of video data. With tiled (or column format) the
notion of line is not as obvious.

https://gitlab.freedesktop.org/gstreamer/gstreamer/-/blob/main/subprojects/gst-plugins-base/gst-libs/gst/video/video-format.c

For tile format with fixed dimensions, we have extra information for the tile
layout (GstVideoTileInfo), but we didn't think about full height tiles when that
was added. This information is needed to generically crop images, which is why
its so complicated.

One option we could use to make it fit as a tiled format is to introduce a new
GstVideoTileMode GST_VIDEO_TILE_MODE_COLUMN. Then we set the tiled height to
match the usual HW height step, making it so you have 128 x step tiles, laid out
in column. Alternatively, we can start with a step of 1 (which will make the
slow path a lot slower then needed), and later add an optimization for the
column layout, so it handle full columns.

If you manage to find a good step size, then all you will have to implement is
the backend for gst_video_tile_get_index(), the rest of the slow path will be
generic.

Fast path are possible, with ORC (a cross platform byte code language that can
be JIT into SIMD instructions), but I don't currently see much use for "fast",
since the GPU pretty much always accepts these buffers, so the software path
remains a neat debugging tool and allow conformance testing.

Nicolas

>
> Dave
>
> > regards,
> > Nicolas
> >
> > >
> > > Thanks
> > > Dave
> > >
> > > Signed-off-by: Dave Stevenson <dave.stevenson@xxxxxxxxxxxxxxx>
> > > ---
> > > Dave Stevenson (4):
> > > docs: uapi: media: Document Raspberry Pi NV12 column format
> > > media: ioctl: Add pixel formats NV12MT_COL128 and NV12MT_10_COL128
> > > media: dt-bindings: media: Add binding for the Raspberry Pi HEVC decoder
> > > arm: dts: bcm2711-rpi: Add HEVC decoder node
> > >
> > > Ezequiel Garcia (1):
> > > RFC: media: Add media_request_{pin,unpin} API
> > >
> > > John Cox (2):
> > > media: platform: Add Raspberry Pi HEVC decoder driver
> > > RFC: v4l2-mem2mem: Remove warning from v4l2_m2m_job_finish
> > >
> > > .../bindings/media/raspberrypi,hevc-dec.yaml | 72 +
> > > .../userspace-api/media/v4l/pixfmt-yuv-planar.rst | 42 +
> > > MAINTAINERS | 10 +
> > > arch/arm/boot/dts/broadcom/bcm2711-rpi.dtsi | 5 +
> > > arch/arm/boot/dts/broadcom/bcm2711.dtsi | 9 +
> > > drivers/media/mc/mc-request.c | 35 +
> > > drivers/media/platform/raspberrypi/Kconfig | 1 +
> > > drivers/media/platform/raspberrypi/Makefile | 1 +
> > > .../media/platform/raspberrypi/hevc_dec/Kconfig | 17 +
> > > .../media/platform/raspberrypi/hevc_dec/Makefile | 5 +
> > > .../media/platform/raspberrypi/hevc_dec/hevc_d.c | 443 ++++
> > > .../media/platform/raspberrypi/hevc_dec/hevc_d.h | 190 ++
> > > .../platform/raspberrypi/hevc_dec/hevc_d_h265.c | 2629 ++++++++++++++++++++
> > > .../platform/raspberrypi/hevc_dec/hevc_d_hw.c | 376 +++
> > > .../platform/raspberrypi/hevc_dec/hevc_d_hw.h | 303 +++
> > > .../platform/raspberrypi/hevc_dec/hevc_d_video.c | 685 +++++
> > > .../platform/raspberrypi/hevc_dec/hevc_d_video.h | 38 +
> > > drivers/media/v4l2-core/v4l2-ioctl.c | 2 +
> > > drivers/media/v4l2-core/v4l2-mem2mem.c | 7 -
> > > include/media/media-request.h | 12 +
> > > include/uapi/linux/videodev2.h | 5 +
> > > 21 files changed, 4880 insertions(+), 7 deletions(-)
> > > ---
> > > base-commit: e90c9612ac3969cb8206029a26bcd2b6f5d4a942
> > > change-id: 20241212-media-rpi-hevc-dec-3b5be739f3bd
> > >
> > > Best regards,
> >