Re: [PATCH 0/7] Raspberry Pi HEVC decoder driver
From: Nicolas Dufresne
Date: Wed Jan 08 2025 - 14:02:42 EST
Hi Dave,
Le mardi 07 janvier 2025 à 17:36 +0000, Dave Stevenson a écrit :
> On Tue, 7 Jan 2025 at 16:13, Dave Stevenson
> <dave.stevenson@xxxxxxxxxxxxxxx> wrote:
> >
> > Hi Nicolas
> >
> > On Mon, 6 Jan 2025 at 20:46, Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
> > >
> > > Hi Dave,
> > >
> > > Le vendredi 20 décembre 2024 à 16:21 +0000, Dave Stevenson a écrit :
> > > > Hi All
> > > >
> > > > This has been in the pipeline for a while, but I've finally cleaned
> > > > up our HEVC decoder driver to be in a shape to at least get a first
> > > > review.
> > > > John Cox has done almost all of the work under contract to Raspberry
> > > > Pi, and I'm largely just doing the process of patch curation and
> > > > sending.
> > > >
> > > > There are a couple of questions raised in frameworks.
> > > > The main one is that the codec has 2 independent phases to the decode,
> > > > CABAC and reconstruction. To keep the decoder operating optimally
> > > > means that two requests need to be in process at once, whilst the
> > > > current frameworks don't want to allow as there is an implicit
> > > > assumption of only a single job being active at once, and
> > > > completition returns both buffers and releases the media request.
> > > >
> > > > The OUTPUT queue buffer is finished with and can be returned at the
> > > > end of phase 1, but the media request is still required for phase 2.
> > > > The frameworks currently force the driver to be returning both
> > > > together via v4l2_m2m_buf_done_and_job_finish. v4l2_m2m_job_finish
> > > > would complete the job without returning the buffer as we need,
> > > > however if the driver has set VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF
> > > > then we have a WARN in v4l2_m2m_job_finish.
> > > > Dropping the WARN as this series is currently doing isn't going to be
> > > > the right answer, but it isn't obvious what the right answer is.
> > > > Discussion required.
> > >
> > > I think part of the manual request completion RFC will be to evaluate the impact
> > > on VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF feature. MTK does not support
> > > interleaved interlaced decoding (only alternate), so they didn't have to
> > > implement that feature.
> > >
> > > Overall, It would be nice to get your feedback on the new manual request
> > > proposal, which is I believe better then the pin/unpin API you have in this
> > > serie.
> >
> > I wasn't aware of that series, but I / John will take a look.
>
> As I said at the beginning, I'm largely an intermediary here, and may
> get things wrong as my codec knowledge is far from comprehensive. I'm
> hoping John will correct me if I misrepresent our conversations.
>
> As you say the MTK codec doesn't set
> VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF, and hence it can call
> v4l2_m2m_job_finish without hitting the WARN.
>
> Your comment about it not supporting interleaved interlaced decoding
> confuses me slightly. Almost all the codecs allow a single frame to be
> encoded as multiple slices, and I'd be surprised if none of the test
> streams exercise that.
MTK VCODEC is a frame based HEVC decoder. All slices are passed within the same
request (up to 600). Upstream, only Cedrus driver is slice based, though for
large number of slices this has performance issues (even though you gain on
latency).
> Our reading of the situation was that if you have more than one
> encoded slice making up the video frame then you are NOT obliged to
> submit all of the slices at once via the variable length array
> control, and submitting the slices one at a time is permitted. That
> means that almost all decoders have to set the
> VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF flag to be able to hold onto
> the CAPTURE buffer until it has been given all the encoded data to
> generate it.
We don't have support for that, a new v4l2_stateless_hevc_decode_mode would be
required. Here's what the spec says:
* - ``V4L2_STATELESS_HEVC_DECODE_MODE_SLICE_BASED``
- 0
- Decoding is done at the slice granularity.
The OUTPUT buffer must contain a **single** slice.
* - ``V4L2_STATELESS_HEVC_DECODE_MODE_FRAME_BASED``
- 1
- Decoding is done at the frame granularity.
The OUTPUT buffer must contain **all** slices needed to decode the
frame.
So one, or all. Additionally, (and doc could be improved here),
V4L2_CID_STATELESS_HEVC_SLICE_PARAMS is strictly required for SLICED_BASED, and
must contain 1 slice. While for the second, some drivers needs it, some don't,
userspace needs to check if the control is present or not.
SLICE_BASED depends on VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF, otherwise you
can't queue capture buffers ahead of time, and loose in throughput. So a new
DYN_SLICE_BASED (with some proper name), would also require it.
VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF is also needed when you need to decode
top and bottom field in the same capture buffer, hence my reference to that (MTK
chromebook firmware don't support that for any codecs).
You could also ignore all this, and implement
V4L2_STATELESS_HEVC_DECODE_MODE_FRAME_BASED on top of your slice based decoder.
No latency gain, but you also no longer need
VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF which let you postpone adding new
helpers to complete the job when this feature is used. Yet, I think the manual
request completion proposal would be a lot better if we had that issue covered.
>
> If there isn't a need to support a multi-slice frame being presented
> via multiple OUTPUT buffers, then we can cull some code and drop the
> flag.
> HEVC has no real concept of interlaced content, so no need to worry
> about that as the other route to having multiple slices producing one
> video frame.
I've seen non-tiled decoders that internally double the stride, and offset the
same buffer by one line for the second field, to generate interleaved data, but
as you said, this is the exception rather then the rule. :-)
p.s. your understanding is pretty accurate
>
> Dave
>
> > > >
> > > > We also have a need to hold on to the media request for phase 2. John
> > > > had discussed this with Ezequiel (and others) a couple of years back,
> > > > and hence suggested a patch that adds media_request_{pin,unpin} to
> > > > grab references on the media request. Discussion required on that
> > > > or a better way of handling it.
> > > >
> > > > I will apologise in advance for sending this V1 just before I head off
> > > > on the Christmas break, but will respond to things as soon as possible.
> > >
> > > One thing missing in this summary is how this driver is being validated
> > > (specially that for this one requires a downstream fork of FFMPEG). To this
> > > report we ask for:
> > >
> > > - v4l2-compliance results
> > > - Fluster conformance tests results [1] and I believe you need [2]
> > >
> > > [1] https://github.com/fluendo/fluster
> > > [2] https://github.com/fluendo/fluster/pull/179
> >
> > Sure, I'll sort that before doing a V2.
> >
> > > GStreamer support is there in main now, but without the needed software video
> > > converter for you column tiling, we can't use it for that (i.e. only works
> > > through GL or Wayland).
> >
> > Can you point me at the right place for the software converter?
> > It's a relatively trivial reformat required to get it back into NV12 /
> > I420 or 10bit equivalents, so happy to do that. I think John already
> > has NEON optimised code if desired.
> >
> > Dave
> >
> > > regards,
> > > Nicolas
> > >
> > > >
> > > > Thanks
> > > > Dave
> > > >
> > > > Signed-off-by: Dave Stevenson <dave.stevenson@xxxxxxxxxxxxxxx>
> > > > ---
> > > > Dave Stevenson (4):
> > > > docs: uapi: media: Document Raspberry Pi NV12 column format
> > > > media: ioctl: Add pixel formats NV12MT_COL128 and NV12MT_10_COL128
> > > > media: dt-bindings: media: Add binding for the Raspberry Pi HEVC decoder
> > > > arm: dts: bcm2711-rpi: Add HEVC decoder node
> > > >
> > > > Ezequiel Garcia (1):
> > > > RFC: media: Add media_request_{pin,unpin} API
> > > >
> > > > John Cox (2):
> > > > media: platform: Add Raspberry Pi HEVC decoder driver
> > > > RFC: v4l2-mem2mem: Remove warning from v4l2_m2m_job_finish
> > > >
> > > > .../bindings/media/raspberrypi,hevc-dec.yaml | 72 +
> > > > .../userspace-api/media/v4l/pixfmt-yuv-planar.rst | 42 +
> > > > MAINTAINERS | 10 +
> > > > arch/arm/boot/dts/broadcom/bcm2711-rpi.dtsi | 5 +
> > > > arch/arm/boot/dts/broadcom/bcm2711.dtsi | 9 +
> > > > drivers/media/mc/mc-request.c | 35 +
> > > > drivers/media/platform/raspberrypi/Kconfig | 1 +
> > > > drivers/media/platform/raspberrypi/Makefile | 1 +
> > > > .../media/platform/raspberrypi/hevc_dec/Kconfig | 17 +
> > > > .../media/platform/raspberrypi/hevc_dec/Makefile | 5 +
> > > > .../media/platform/raspberrypi/hevc_dec/hevc_d.c | 443 ++++
> > > > .../media/platform/raspberrypi/hevc_dec/hevc_d.h | 190 ++
> > > > .../platform/raspberrypi/hevc_dec/hevc_d_h265.c | 2629 ++++++++++++++++++++
> > > > .../platform/raspberrypi/hevc_dec/hevc_d_hw.c | 376 +++
> > > > .../platform/raspberrypi/hevc_dec/hevc_d_hw.h | 303 +++
> > > > .../platform/raspberrypi/hevc_dec/hevc_d_video.c | 685 +++++
> > > > .../platform/raspberrypi/hevc_dec/hevc_d_video.h | 38 +
> > > > drivers/media/v4l2-core/v4l2-ioctl.c | 2 +
> > > > drivers/media/v4l2-core/v4l2-mem2mem.c | 7 -
> > > > include/media/media-request.h | 12 +
> > > > include/uapi/linux/videodev2.h | 5 +
> > > > 21 files changed, 4880 insertions(+), 7 deletions(-)
> > > > ---
> > > > base-commit: e90c9612ac3969cb8206029a26bcd2b6f5d4a942
> > > > change-id: 20241212-media-rpi-hevc-dec-3b5be739f3bd
> > > >
> > > > Best regards,
> > >