Re: [PATCH 2/2] [WIP]: media: Add Synaptics compressed tiled format
From: Nicolas Dufresne
Date: Mon Aug 22 2022 - 10:15:25 EST
Le samedi 20 août 2022 à 08:10 +0800, Hsia-Jun Li a écrit :
>
> On 8/20/22 03:17, Nicolas Dufresne wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> >
> >
> > Le vendredi 19 août 2022 à 23:44 +0800, Hsia-Jun Li a écrit :
> > >
> > > On 8/19/22 23:28, Nicolas Dufresne wrote:
> > > > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> > > >
> > > >
> > > > Le vendredi 19 août 2022 à 02:13 +0300, Laurent Pinchart a écrit :
> > > > > On Thu, Aug 18, 2022 at 02:33:42PM +0800, Hsia-Jun Li wrote:
> > > > > > On 8/18/22 14:06, Tomasz Figa wrote:
> > > > > > > On Tue, Aug 9, 2022 at 1:28 AM Hsia-Jun Li <randy.li@xxxxxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > From: "Hsia-Jun(Randy) Li" <randy.li@xxxxxxxxxxxxx>
> > > > > > > >
> > > > > > > > The most of detail has been written in the drm.
> > > > >
> > > > > This patch still needs a description of the format, which should go to
> > > > > Documentation/userspace-api/media/v4l/.
> > > > >
> > > > > > > > Please notice that the tiled formats here request
> > > > > > > > one more plane for storing the motion vector metadata.
> > > > > > > > This buffer won't be compressed, so you can't append
> > > > > > > > it to luma or chroma plane.
> > > > > > >
> > > > > > > Does the motion vector buffer need to be exposed to userspace? Is the
> > > > > > > decoder stateless (requires userspace to specify the reference frames)
> > > > > > > or stateful (manages the entire decoding process internally)?
> > > > > >
> > > > > > No, users don't need to access them at all. Just they need a different
> > > > > > dma-heap.
> > > > > >
> > > > > > You would only get the stateful version of both encoder and decoder.
> > > > >
> > > > > Shouldn't the motion vectors be stored in a separate V4L2 buffer,
> > > > > submitted through a different queue then ?
> > > >
> > > > Imho, I believe these should be invisible to users and pooled separately to
> > > > reduce the overhead. The number of reference is usually lower then the number of
> > > > allocated display buffers.
> > > >
> > > You can't. The motion vector buffer can't share with the luma and chroma
> > > data planes, nor the data plane for the compression meta data.
> > >
> > > You could consider this as a security requirement(the memory region for
> > > the MV could only be accessed by the decoder) or hardware limitation.
> > >
> > > It is also not very easy to manage such a large buffer that would change
> > > when the resolution changed.
> >
> > Your argument are just aiming toward the fact that you should not let the user
> > allocate these in the first place. They should not be bound to the v4l2 buffer.
> > Allocate these in your driver, and leave to your user the pixel buffer (and
> > compress meta) allocation work.
> >
> What I want to say is that userspace could allocate buffers then make
> the v4l2 decoder import these buffers, but each planes should come from
> the right DMA-heaps. Usually the userspace would know better the memory
> occupation, it would bring some flexibility here.
>
> Currently, they are another thing bothers me, I need to allocate a small
> piece of memory(less than 128KiB) as the compression metadata buffers as
> I mentioned here. And these pieces of memory should be located in a
> small region, or the performance could be badly hurt, besides, we don't
> support IOMMU for this kind of data.
>
> Any idea about assign a small piece of memory from a pre-allocated
> memory or select region(I don't think I could reserve them in a
> DMA-heap) for a plane in the MMAP type buffer ?
A V4L2 driver should first implement the V4L2 semantic before adding optional
use case like buffer importation. For this reason, your V4L2 driver should know
all the memory requirements and how to allocate that memory. Later on, your
importing driver will have to validate that the userland did it right at
importation. This is to follow V4L2 semantic and security model. If you move
simply trust the userland (gralloc), you are not doing it right.
>
> Besides, I am not very satisfied with the dynamic resolution change
> steps if I understand it correct. Buffers reallocation should happen
> when we receive the event not until the drain is done. A resolution
> rising is very common when you are playing a network stream, it would be
> better that the decoder decided how many buffers it need for the
> previous sequence while the userspace could reallocate the reset of
> buffers in the CAPTURE queue.
> > Other driver handle this just fine, if your v4l2 driver implement the v4l2
> > resolution change mechanism, is should be very simple to manage.
This is a limitation of the queue design of V4L2. While streaming the buffers
associated with the queue must currently be large enough to support the selected
format. "large enough" in your case is complex, and validation must be
programmed in your driver.
There was discussion on perhaps extending on CREATE_BUFS feature, that let you
allocate at run-time for a different format/resolution (no drivers currently
allow that). The rules around that aren't specified (and will have to be defined
before a driver starts making use of that). Note that to be usable, a
DELETE_BUF(s) ioctl would probably be needed too.
In current state, If your driver can support it, userland does not strictly need
to re-allocate if the resolution is changed to smaller. In most SVC scenarios,
the largest resolution is known in advance, so pre-allocation can happen to the
appropriate resolution and queue size. Re-allocation is then rarely triggered at
run time. Unlike your system, IOMMU system are not as affected by allocation
latency and manages to do gapless transition despite this inefficiency.
Note that all this is pure recommendation. What I'm seeing here is a pixel
format documented with Android assumptions rather then mainline, and sent
without the associated implementation. This simply raises some question on the
viability of the whole. This is not a critic but just some verification that
ensure you are following the V4L2 spec.
> >
> > > > >
> > > > > > > > Signed-off-by: Hsia-Jun(Randy) Li <randy.li@xxxxxxxxxxxxx>
> > > > > > > > ---
> > > > > > > > drivers/media/v4l2-core/v4l2-common.c | 1 +
> > > > > > > > drivers/media/v4l2-core/v4l2-ioctl.c | 2 ++
> > > > > > > > include/uapi/linux/videodev2.h | 2 ++
> > > > > > > > 3 files changed, 5 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/drivers/media/v4l2-core/v4l2-common.c b/drivers/media/v4l2-core/v4l2-common.c
> > > > > > > > index e0fbe6ba4b6c..f645278b3055 100644
> > > > > > > > --- a/drivers/media/v4l2-core/v4l2-common.c
> > > > > > > > +++ b/drivers/media/v4l2-core/v4l2-common.c
> > > > > > > > @@ -314,6 +314,7 @@ const struct v4l2_format_info *v4l2_format_info(u32 format)
> > > > > > > > { .format = V4L2_PIX_FMT_SGBRG12, .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > > > > > > { .format = V4L2_PIX_FMT_SGRBG12, .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > > > > > > { .format = V4L2_PIX_FMT_SRGGB12, .pixel_enc = V4L2_PIXEL_ENC_BAYER, .mem_planes = 1, .comp_planes = 1, .bpp = { 2, 0, 0, 0 }, .hdiv = 1, .vdiv = 1 },
> > > > > > > > + { .format = V4L2_PIX_FMT_NV12M_V4H1C, .pixel_enc = V4L2_PIXEL_ENC_YUV, .mem_planes = 5, .comp_planes = 2, .bpp = { 1, 2, 0, 0 }, .hdiv = 2, .vdiv = 2, .block_w = { 128, 128 }, .block_h = { 128, 128 } },
> > > > > > > > };
> > > > > > > > unsigned int i;
> > > > > > > >
> > > > > > > > diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > > index e6fd355a2e92..8f65964aff08 100644
> > > > > > > > --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > > +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> > > > > > > > @@ -1497,6 +1497,8 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
> > > > > > > > case V4L2_PIX_FMT_MT21C: descr = "Mediatek Compressed Format"; break;
> > > > > > > > case V4L2_PIX_FMT_QC08C: descr = "QCOM Compressed 8-bit Format"; break;
> > > > > > > > case V4L2_PIX_FMT_QC10C: descr = "QCOM Compressed 10-bit Format"; break;
> > > > > > > > + case V4L2_PIX_FMT_NV12M_V4H1C: descr = "Synaptics Compressed 8-bit tiled Format";break;
> > > > > > > > + case V4L2_PIX_FMT_NV12M_10_V4H3P8C: descr = "Synaptics Compressed 10-bit tiled Format";break;
> > > > > > > > default:
> > > > > > > > if (fmt->description[0])
> > > > > > > > return;
> > > > > > > > diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> > > > > > > > index 01e630f2ec78..7e928cb69e7c 100644
> > > > > > > > --- a/include/uapi/linux/videodev2.h
> > > > > > > > +++ b/include/uapi/linux/videodev2.h
> > > > > > > > @@ -661,6 +661,8 @@ struct v4l2_pix_format {
> > > > > > > > #define V4L2_PIX_FMT_NV12MT_16X16 v4l2_fourcc('V', 'M', '1', '2') /* 12 Y/CbCr 4:2:0 16x16 tiles */
> > > > > > > > #define V4L2_PIX_FMT_NV12M_8L128 v4l2_fourcc('N', 'A', '1', '2') /* Y/CbCr 4:2:0 8x128 tiles */
> > > > > > > > #define V4L2_PIX_FMT_NV12M_10BE_8L128 v4l2_fourcc_be('N', 'T', '1', '2') /* Y/CbCr 4:2:0 10-bit 8x128 tiles */
> > > > > > > > +#define V4L2_PIX_FMT_NV12M_V4H1C v4l2_fourcc('S', 'Y', '1', '2') /* 12 Y/CbCr 4:2:0 tiles */
> > > > > > > > +#define V4L2_PIX_FMT_NV12M_10_V4H3P8C v4l2_fourcc('S', 'Y', '1', '0') /* 12 Y/CbCr 4:2:0 10-bits tiles */
> > > > > > > >
> > > > > > > > /* Bayer formats - see https://urldefense.proofpoint.com/v2/url?u=http-3A__www.siliconimaging.com_RGB-2520Bayer.htm&d=DwIFaQ&c=7dfBJ8cXbWjhc0BhImu8wVIoUFmBzj1s88r8EGyM0UY&r=P4xb2_7biqBxD4LGGPrSV6j-jf3C3xlR7PXU-mLTeZE&m=lkQiuhx0yMAYHGcW-0WaHlF3e2etMHsu-FoNIBdZILGH6FPigwSAmel2vAdcVLkp&s=JKsBzpb_3u9xv52MaMuT4U3T1pPqcObYkpHDBxvcx_4&e= */
> > > > > > > > #define V4L2_PIX_FMT_SBGGR8 v4l2_fourcc('B', 'A', '8', '1') /* 8 BGBG.. GRGR.. */
> > > > >
> > > >
> > >
> >
>