Re: [PATCH v3 2/2] media: docs-rst: Document memory-to-memory video encoder interface

From: Tomasz Figa
Date: Mon Apr 15 2019 - 04:56:37 EST


On Thu, Apr 11, 2019 at 1:05 AM Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
>
> Le mercredi 10 avril 2019 Ã 10:50 +0200, Hans Verkuil a Ãcrit :
> > On 4/9/19 11:35 AM, Tomasz Figa wrote:
> > > On Mon, Apr 8, 2019 at 8:11 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
> > > > On 4/8/19 11:23 AM, Tomasz Figa wrote:
> > > > > On Fri, Apr 5, 2019 at 7:03 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
> > > > > > On 4/5/19 10:12 AM, Tomasz Figa wrote:
> > > > > > > On Thu, Mar 14, 2019 at 10:57 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
> > > > > > > > Hi Tomasz,
> > > > > > > >
> > > > > > > > Some more comments...
> > > > > > > >
> > > > > > > > On 1/29/19 2:52 PM, Hans Verkuil wrote:
> > > > > > > > > Hi Tomasz,
> > > > > > > > >
> > > > > > > > > Some comments below. Nothing major, so I think a v4 should be ready to be
> > > > > > > > > merged.
> > > > > > > > >
> > > > > > > > > On 1/24/19 11:04 AM, Tomasz Figa wrote:
> > > > > > > > > > Due to complexity of the video encoding process, the V4L2 drivers of
> > > > > > > > > > stateful encoder hardware require specific sequences of V4L2 API calls
> > > > > > > > > > to be followed. These include capability enumeration, initialization,
> > > > > > > > > > encoding, encode parameters change, drain and reset.
> > > > > > > > > >
> > > > > > > > > > Specifics of the above have been discussed during Media Workshops at
> > > > > > > > > > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > > > > > > > > > Conference Europe 2014 in DÃsseldorf. The de facto Codec API that
> > > > > > > > > > originated at those events was later implemented by the drivers we already
> > > > > > > > > > have merged in mainline, such as s5p-mfc or coda.
> > > > > > > > > >
> > > > > > > > > > The only thing missing was the real specification included as a part of
> > > > > > > > > > Linux Media documentation. Fix it now and document the encoder part of
> > > > > > > > > > the Codec API.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Tomasz Figa <tfiga@xxxxxxxxxxxx>
> > > > > > > > > > ---
> > > > > > > > > > Documentation/media/uapi/v4l/dev-encoder.rst | 586 ++++++++++++++++++
> > > > > > > > > > Documentation/media/uapi/v4l/dev-mem2mem.rst | 1 +
> > > > > > > > > > Documentation/media/uapi/v4l/pixfmt-v4l2.rst | 5 +
> > > > > > > > > > Documentation/media/uapi/v4l/v4l2.rst | 2 +
> > > > > > > > > > .../media/uapi/v4l/vidioc-encoder-cmd.rst | 38 +-
> > > > > > > > > > 5 files changed, 617 insertions(+), 15 deletions(-)
> > > > > > > > > > create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> > > > > > > > > >
> > > > > > > > > > diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
> > > > > > > > > > new file mode 100644
> > > > > > > > > > index 000000000000..fb8b05a132ee
> > > > > > > > > > --- /dev/null
> > > > > > > > > > +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> > > > > > > > > > @@ -0,0 +1,586 @@
> > > > > > > > > > +.. -*- coding: utf-8; mode: rst -*-
> > > > > > > > > > +
> > > > > > > > > > +.. _encoder:
> > > > > > > > > > +
> > > > > > > > > > +*************************************************
> > > > > > > > > > +Memory-to-memory Stateful Video Encoder Interface
> > > > > > > > > > +*************************************************
> > > > > > > > > > +
> > > > > > > > > > +A stateful video encoder takes raw video frames in display order and encodes
> > > > > > > > > > +them into a bitstream. It generates complete chunks of the bitstream, including
> > > > > > > > > > +all metadata, headers, etc. The resulting bitstream does not require any
> > > > > > > > > > +further post-processing by the client.
> > > > > > > > > > +
> > > > > > > > > > +Performing software stream processing, header generation etc. in the driver
> > > > > > > > > > +in order to support this interface is strongly discouraged. In case such
> > > > > > > > > > +operations are needed, use of the Stateless Video Encoder Interface (in
> > > > > > > > > > +development) is strongly advised.
> > > > > > > > > > +
> > > > > > > > > > +Conventions and notation used in this document
> > > > > > > > > > +==============================================
> > > > > > > > > > +
> > > > > > > > > > +1. The general V4L2 API rules apply if not specified in this document
> > > > > > > > > > + otherwise.
> > > > > > > > > > +
> > > > > > > > > > +2. The meaning of words "must", "may", "should", etc. is as per `RFC
> > > > > > > > > > + 2119 <https://tools.ietf.org/html/rfc2119>`_.
> > > > > > > > > > +
> > > > > > > > > > +3. All steps not marked "optional" are required.
> > > > > > > > > > +
> > > > > > > > > > +4. :c:func:`VIDIOC_G_EXT_CTRLS` and :c:func:`VIDIOC_S_EXT_CTRLS` may be used
> > > > > > > > > > + interchangeably with :c:func:`VIDIOC_G_CTRL` and :c:func:`VIDIOC_S_CTRL`,
> > > > > > > > > > + unless specified otherwise.
> > > > > > > > > > +
> > > > > > > > > > +5. Single-planar API (see :ref:`planar-apis`) and applicable structures may be
> > > > > > > > > > + used interchangeably with multi-planar API, unless specified otherwise,
> > > > > > > > > > + depending on decoder capabilities and following the general V4L2 guidelines.
> > > > > > > > > > +
> > > > > > > > > > +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> > > > > > > > > > + [0..2]: i = 0, 1, 2.
> > > > > > > > > > +
> > > > > > > > > > +7. Given an ``OUTPUT`` buffer A, then Aâ represents a buffer on the ``CAPTURE``
> > > > > > > > > > + queue containing data that resulted from processing buffer A.
> > > > > > > > > > +
> > > > > > > > > > +Glossary
> > > > > > > > > > +========
> > > > > > > > > > +
> > > > > > > > > > +Refer to :ref:`decoder-glossary`.
> > > > > > > > > > +
> > > > > > > > > > +State machine
> > > > > > > > > > +=============
> > > > > > > > > > +
> > > > > > > > > > +.. kernel-render:: DOT
> > > > > > > > > > + :alt: DOT digraph of encoder state machine
> > > > > > > > > > + :caption: Encoder state machine
> > > > > > > > > > +
> > > > > > > > > > + digraph encoder_state_machine {
> > > > > > > > > > + node [shape = doublecircle, label="Encoding"] Encoding;
> > > > > > > > > > +
> > > > > > > > > > + node [shape = circle, label="Initialization"] Initialization;
> > > > > > > > > > + node [shape = circle, label="Stopped"] Stopped;
> > > > > > > > > > + node [shape = circle, label="Drain"] Drain;
> > > > > > > > > > + node [shape = circle, label="Reset"] Reset;
> > > > > > > > > > +
> > > > > > > > > > + node [shape = point]; qi
> > > > > > > > > > + qi -> Initialization [ label = "open()" ];
> > > > > > > > > > +
> > > > > > > > > > + Initialization -> Encoding [ label = "Both queues streaming" ];
> > > > > > > > > > +
> > > > > > > > > > + Encoding -> Drain [ label = "V4L2_DEC_CMD_STOP" ];
> > > > > > > > > > + Encoding -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ];
> > > > > > > > > > + Encoding -> Stopped [ label = "VIDIOC_STREAMOFF(OUTPUT)" ];
> > > > > > > > > > + Encoding -> Encoding;
> > > > > > > > > > +
> > > > > > > > > > + Drain -> Stopped [ label = "All CAPTURE\nbuffers dequeued\nor\nVIDIOC_STREAMOFF(CAPTURE)" ];
> > > > > > > > > > + Drain -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ];
> > > > > > > > > > +
> > > > > > > > > > + Reset -> Encoding [ label = "VIDIOC_STREAMON(CAPTURE)" ];
> > > > > > > > > > + Reset -> Initialization [ label = "VIDIOC_REQBUFS(OUTPUT, 0)" ];
> > > > > > > > > > +
> > > > > > > > > > + Stopped -> Encoding [ label = "V4L2_DEC_CMD_START\nor\nVIDIOC_STREAMON(OUTPUT)" ];
> > > > > > > > > > + Stopped -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ];
> > > > > > > > > > + }
> > > > > > > > > > +
> > > > > > > > > > +Querying capabilities
> > > > > > > > > > +=====================
> > > > > > > > > > +
> > > > > > > > > > +1. To enumerate the set of coded formats supported by the encoder, the
> > > > > > > > > > + client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
> > > > > > > > > > +
> > > > > > > > > > + * The full set of supported formats will be returned, regardless of the
> > > > > > > > > > + format set on ``OUTPUT``.
> > > > > > > > > > +
> > > > > > > > > > +2. To enumerate the set of supported raw formats, the client may call
> > > > > > > > > > + :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
> > > > > > > > > > +
> > > > > > > > > > + * Only the formats supported for the format currently active on ``CAPTURE``
> > > > > > > > > > + will be returned.
> > > > > > > > > > +
> > > > > > > > > > + * In order to enumerate raw formats supported by a given coded format,
> > > > > > > > > > + the client must first set that coded format on ``CAPTURE`` and then
> > > > > > > > > > + enumerate the formats on ``OUTPUT``.
> > > > > > > > > > +
> > > > > > > > > > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > > > > > > > > > + resolutions for a given format, passing desired pixel format in
> > > > > > > > > > + :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> > > > > > > > > > +
> > > > > > > > > > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a coded pixel
> > > > > > > > > > + format will include all possible coded resolutions supported by the
> > > > > > > > > > + encoder for given coded pixel format.
> > > > > > > > > > +
> > > > > > > > > > + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a raw pixel format
> > > > > > > > > > + will include all possible frame buffer resolutions supported by the
> > > > > > > > > > + encoder for given raw pixel format and coded format currently set on
> > > > > > > > > > + ``CAPTURE``.
> > > > > > > > > > +
> > > > > > > > > > +4. Supported profiles and levels for the coded format currently set on
> > > > > > > > > > + ``CAPTURE``, if applicable, may be queried using their respective controls
> > > > > > > > > > + via :c:func:`VIDIOC_QUERYCTRL`.
> > > > > > > > > > +
> > > > > > > > > > +5. Any additional encoder capabilities may be discovered by querying
> > > > > > > > > > + their respective controls.
> > > > > > > > > > +
> > > > > > > > > > +Initialization
> > > > > > > > > > +==============
> > > > > > > > > > +
> > > > > > > > > > +1. Set the coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT`
> > > > > > > > > > +
> > > > > > > > > > + * **Required fields:**
> > > > > > > > > > +
> > > > > > > > > > + ``type``
> > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > > > > > > > > > +
> > > > > > > > > > + ``pixelformat``
> > > > > > > > > > + the coded format to be produced
> > > > > > > > > > +
> > > > > > > > > > + ``sizeimage``
> > > > > > > > > > + desired size of ``CAPTURE`` buffers; the encoder may adjust it to
> > > > > > > > > > + match hardware requirements
> > > > > > > > > > +
> > > > > > > > > > + ``width``, ``height``
> > > > > > > > > > + ignored (always zero)
> > > > > > > > > > +
> > > > > > > > > > + other fields
> > > > > > > > > > + follow standard semantics
> > > > > > > > > > +
> > > > > > > > > > + * **Return fields:**
> > > > > > > > > > +
> > > > > > > > > > + ``sizeimage``
> > > > > > > > > > + adjusted size of ``CAPTURE`` buffers
> > > > > > > > > > +
> > > > > > > > > > + .. important::
> > > > > > > > > > +
> > > > > > > > > > + Changing the ``CAPTURE`` format may change the currently set ``OUTPUT``
> > > > > > > > > > + format. The encoder will derive a new ``OUTPUT`` format from the
> > > > > > > > > > + ``CAPTURE`` format being set, including resolution, colorimetry
> > > > > > > > > > + parameters, etc. If the client needs a specific ``OUTPUT`` format, it
> > > > > > > > > > + must adjust it afterwards.
> > > > > > > > >
> > > > > > > > > Hmm, "including resolution": if width and height are set to 0, what should the
> > > > > > > > > OUTPUT resolution be? Up to the driver? I think this should be clarified since
> > > > > > > > > at a first reading of this paragraph it appears to be contradictory.
> > > > > > > >
> > > > > > > > I think the driver should just return the width and height of the OUTPUT
> > > > > > > > format. So the width and height that userspace specifies is just ignored
> > > > > > > > and replaced by the width and height of the OUTPUT format. After all, that's
> > > > > > > > what the bitstream will encode. Returning 0 for width and height would make
> > > > > > > > this a strange exception in V4L2 and I want to avoid that.
> > > > > > > >
> > > > > > >
> > > > > > > Hmm, however, the width and height of the OUTPUT format is not what's
> > > > > > > actually encoded in the bitstream. The right selection rectangle
> > > > > > > determines that.
> > > > > > >
> > > > > > > In one of the previous versions I though we could put the codec
> > > > >
> > > > > s/codec/coded/...
> > > > >
> > > > > > > resolution as the width and height of the CAPTURE format, which would
> > > > > > > be the resolution of the encoded image rounded up to full macroblocks
> > > > > > > +/- some encoder-specific constraints. AFAIR there was some concern
> > > > > > > about OUTPUT format changes triggering CAPTURE format changes, but to
> > > > > > > be honest, I'm not sure if that's really a problem. I just decided to
> > > > > > > drop that for the simplicity.
> > > > > >
> > > > > > I'm not sure what your point is.
> > > > > >
> > > > > > The OUTPUT format has the coded resolution,
> > > > >
> > > > > That's not always true. The OUTPUT format is just the format of the
> > > > > source frame buffers. In special cases where the source resolution is
> > > > > nicely aligned, it would be the same as coded size, but the remaining
> > > > > cases are valid as well.
> > > > >
> > > > > > so when you set the
> > > > > > CAPTURE format it can just copy the OUTPUT coded resolution unless the
> > > > > > chosen CAPTURE pixelformat can't handle that in which case both the
> > > > > > OUTPUT and CAPTURE coded resolutions are clamped to whatever is the maximum
> > > > > > or minimum the codec is capable of.
> > > > >
> > > > > As per my comment above, generally speaking, the encoder will derive
> > > > > an appropriate coded format from the OUTPUT format, but also other
> > > > > factors, like the crop rectangles and possibly some internal
> > > > > constraints.
> > > > >
> > > > > > That said, I am fine with just leaving it up to the driver as suggested
> > > > > > before. Just as long as both the CAPTURE and OUTPUT formats remain valid
> > > > > > (i.e. width and height may never be out of range).
> > > > > >
> > > > >
> > > > > Sounds good to me.
> > > > >
> > > > > > > > > > +
> > > > > > > > > > +2. **Optional.** Enumerate supported ``OUTPUT`` formats (raw formats for
> > > > > > > > > > + source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`.
> > > > > > > > > > +
> > > > > > > > > > + * **Required fields:**
> > > > > > > > > > +
> > > > > > > > > > + ``type``
> > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > > > > > > > > > +
> > > > > > > > > > + other fields
> > > > > > > > > > + follow standard semantics
> > > > > > > > > > +
> > > > > > > > > > + * **Return fields:**
> > > > > > > > > > +
> > > > > > > > > > + ``pixelformat``
> > > > > > > > > > + raw format supported for the coded format currently selected on
> > > > > > > > > > + the ``CAPTURE`` queue.
> > > > > > > > > > +
> > > > > > > > > > + other fields
> > > > > > > > > > + follow standard semantics
> > > > > > > > > > +
> > > > > > > > > > +3. Set the raw source format on the ``OUTPUT`` queue via
> > > > > > > > > > + :c:func:`VIDIOC_S_FMT`.
> > > > > > > > > > +
> > > > > > > > > > + * **Required fields:**
> > > > > > > > > > +
> > > > > > > > > > + ``type``
> > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > > > > > > > > > +
> > > > > > > > > > + ``pixelformat``
> > > > > > > > > > + raw format of the source
> > > > > > > > > > +
> > > > > > > > > > + ``width``, ``height``
> > > > > > > > > > + source resolution
> > > > > > > > > > +
> > > > > > > > > > + other fields
> > > > > > > > > > + follow standard semantics
> > > > > > > > > > +
> > > > > > > > > > + * **Return fields:**
> > > > > > > > > > +
> > > > > > > > > > + ``width``, ``height``
> > > > > > > > > > + may be adjusted by encoder to match alignment requirements, as
> > > > > > > > > > + required by the currently selected formats
> > > > > > > > >
> > > > > > > > > What if the width x height is larger than the maximum supported by the
> > > > > > > > > selected coded format? This should probably mention that in that case the
> > > > > > > > > width x height is reduced to the largest allowed value. Also mention that
> > > > > > > > > this maximum is reported by VIDIOC_ENUM_FRAMESIZES.
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > + other fields
> > > > > > > > > > + follow standard semantics
> > > > > > > > > > +
> > > > > > > > > > + * Setting the source resolution will reset the selection rectangles to their
> > > > > > > > > > + default values, based on the new resolution, as described in the step 5
> > > > > > > > >
> > > > > > > > > 5 -> 4
> > > > > > > > >
> > > > > > > > > Or just say: "as described in the next step."
> > > > > > > > >
> > > > > > > > > > + below.
> > > > > > > >
> > > > > > > > It should also be made explicit that:
> > > > > > > >
> > > > > > > > 1) the crop rectangle will be set to the given width and height *before*
> > > > > > > > it is being adjusted by S_FMT.
> > > > > > > >
> > > > > > >
> > > > > > > I don't think that's what we want here.
> > > > > > >
> > > > > > > Defining the default rectangle to be exactly the same as the OUTPUT
> > > > > > > resolution (after the adjustment) makes the semantics consistent - not
> > > > > > > setting the crop rectangle gives you exactly the behavior as if there
> > > > > > > was no cropping involved (or supported by the encoder).
> > > > > >
> > > > > > I think you are right. This seems to be what the coda driver does as well.
> > > > > > It is convenient to be able to just set a 1920x1080 format and have that
> > > > > > resolution be stored as the crop rectangle, since it avoids having to call
> > > > > > s_selection afterwards, but it is not really consistent with the way V4L2
> > > > > > works.
> > > > > >
> > > > > > > > Open question: should we support a compose rectangle for the CAPTURE that
> > > > > > > > is the same as the OUTPUT crop rectangle? I.e. the CAPTURE format contains
> > > > > > > > the adjusted width and height and the compose rectangle (read-only) contains
> > > > > > > > the visible width and height. It's not strictly necessary, but it is
> > > > > > > > symmetrical.
> > > > > > >
> > > > > > > Wouldn't it rather be the CAPTURE crop rectangle that would be of the
> > > > > > > same resolution of the OUTPUT compose rectangle? Then you could
> > > > > > > actually have the CAPTURE compose rectangle for putting that into the
> > > > > > > desired rectangle of the encoded stream, if the encoder supports that.
> > > > > > > (I don't know any that does, so probably out of concern for now.)
> > > > > >
> > > > > > Yes, you are right.
> > > > > >
> > > > > > But should we support this?
> > > > > >
> > > > > > I actually think not for this initial version. It can be added later, I guess.
> > > > > >
> > > > >
> > > > > I think it boils down on whether adding it later wouldn't
> > > > > significantly complicate the application logic. It also relates to my
> > > > > other comment somewhere below.
> > > > >
> > > > > > > > 2) the CAPTURE format will be updated as well with the new OUTPUT width and
> > > > > > > > height. The CAPTURE sizeimage might change as well.
> > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +4. **Optional.** Set the visible resolution for the stream metadata via
> > > > > > > > > > + :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
> > > > > > > >
> > > > > > > > I think you should mention that this is only necessary if the crop rectangle
> > > > > > > > that is set when you set the format isn't what you want.
> > > > > > > >
> > > > > > >
> > > > > > > Ack.
> > > > > > >
> > > > > > > > > > +
> > > > > > > > > > + * **Required fields:**
> > > > > > > > > > +
> > > > > > > > > > + ``type``
> > > > > > > > > > + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > > > > > > > > > +
> > > > > > > > > > + ``target``
> > > > > > > > > > + set to ``V4L2_SEL_TGT_CROP``
> > > > > > > > > > +
> > > > > > > > > > + ``r.left``, ``r.top``, ``r.width``, ``r.height``
> > > > > > > > > > + visible rectangle; this must fit within the `V4L2_SEL_TGT_CROP_BOUNDS`
> > > > > > > > > > + rectangle and may be subject to adjustment to match codec and
> > > > > > > > > > + hardware constraints
> > > > > > > > > > +
> > > > > > > > > > + * **Return fields:**
> > > > > > > > > > +
> > > > > > > > > > + ``r.left``, ``r.top``, ``r.width``, ``r.height``
> > > > > > > > > > + visible rectangle adjusted by the encoder
> > > > > > > > > > +
> > > > > > > > > > + * The following selection targets are supported on ``OUTPUT``:
> > > > > > > > > > +
> > > > > > > > > > + ``V4L2_SEL_TGT_CROP_BOUNDS``
> > > > > > > > > > + equal to the full source frame, matching the active ``OUTPUT``
> > > > > > > > > > + format
> > > > > > > > > > +
> > > > > > > > > > + ``V4L2_SEL_TGT_CROP_DEFAULT``
> > > > > > > > > > + equal to ``V4L2_SEL_TGT_CROP_BOUNDS``
> > > > > > > > > > +
> > > > > > > > > > + ``V4L2_SEL_TGT_CROP``
> > > > > > > > > > + rectangle within the source buffer to be encoded into the
> > > > > > > > > > + ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
> > > > > > > > > > +
> > > > > > > > > > + .. note::
> > > > > > > > > > +
> > > > > > > > > > + A common use case for this selection target is encoding a source
> > > > > > > > > > + video with a resolution that is not a multiple of a macroblock,
> > > > > > > > > > + e.g. the common 1920x1080 resolution may require the source
> > > > > > > > > > + buffers to be aligned to 1920x1088 for codecs with 16x16 macroblock
> > > > > > > > > > + size. To avoid encoding the padding, the client needs to explicitly
> > > > > > > > > > + configure this selection target to 1920x1080.
> > > > > > > >
> > > > > > > > This last sentence contradicts the proposed behavior of S_FMT(OUTPUT).
> > > > > > > >
> > > > > > >
> > > > > > > Sorry, which part exactly and what part of the proposal exactly? :)
> > > > > > > (My comment above might be related, though.)
> > > > > >
> > > > > > Ignore my comment. We go back to explicitly requiring userspace to set the OUTPUT
> > > > > > crop selection target, so this note remains valid.
> > > > > >
> > > > >
> > > > > Ack.
> > > > >
> > > > > > > > > > +
> > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> > > > > > > > > > + maximum rectangle within the coded resolution, which the cropped
> > > > > > > > > > + source frame can be composed into; if the hardware does not support
> > > > > > > > > > + composition or scaling, then this is always equal to the rectangle of
> > > > > > > > > > + width and height matching ``V4L2_SEL_TGT_CROP`` and located at (0, 0)
> > > > > > > > > > +
> > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> > > > > > > > > > + equal to a rectangle of width and height matching
> > > > > > > > > > + ``V4L2_SEL_TGT_CROP`` and located at (0, 0)
> > > > > > > > > > +
> > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE``
> > > > > > > > > > + rectangle within the coded frame, which the cropped source frame
> > > > > > > > > > + is to be composed into; defaults to
> > > > > > > > > > + ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
> > > > > > > > > > + additional compose/scaling capabilities; resulting stream will
> > > > > > > > > > + have this rectangle encoded as the visible rectangle in its
> > > > > > > > > > + metadata
> > > > > > > >
> > > > > > > > I think the compose targets for OUTPUT are only needed if the hardware can
> > > > > > > > actually do scaling and/or composition. Otherwise they can (must?) be
> > > > > > > > dropped.
> > > > > > > >
> > > > > > >
> > > > > > > Note that V4L2_SEL_TGT_COMPOSE is defined to be the way for the
> > > > > > > userspace to learn the target visible rectangle that's going to be
> > > > > > > encoded in the stream metadata. If we omit it, we wouldn't have a way
> > > > > > > that would be consistent between encoders that can do
> > > > > > > scaling/composition and those that can't.
> > > > > >
> > > > > > I'm not convinced about this. The standard API behavior is not to expose
> > > > > > functionality that the hardware can't do. So if scaling isn't possible on
> > > > > > the OUTPUT side, then it shouldn't expose OUTPUT compose rectangles.
> > > > > >
> > > > > > I also believe it very unlikely that we'll see encoders capable of scaling
> > > > > > as it doesn't make much sense.
> > > > >
> > > > > It does make a lot of sense - WebRTC requires 3 different sizes of the
> > > > > stream to be encoded at the same time. However, unfortunately, I
> > > > > haven't yet seen an encoder capable of doing so.
> > > > >
> > > > > > I would prefer to drop this to simplify the
> > > > > > spec, and when we get encoders that can scale, then we can add support for
> > > > > > compose rectangles (and I'm sure we'll need to think about how that
> > > > > > influences the CAPTURE side as well).
> > > > > >
> > > > > > For encoders without scaling it is the OUTPUT crop rectangle that defines
> > > > > > the visible rectangle.
> > > > > >
> > > > > > > However, with your proposal of actually having selection rectangles
> > > > > > > for the CAPTURE queue, it could be solved indeed. The OUTPUT queue
> > > > > > > would expose a varying set of rectangles, depending on the hardware
> > > > > > > capability, while the CAPTURE queue would always expose its rectangle
> > > > > > > with that information.
> > > > > >
> > > > > > I think we should keep it simple and only define selection rectangles
> > > > > > when really needed.
> > > > > >
> > > > > > So encoders support CROP on the OUTPUT, and decoders support CAPTURE
> > > > > > COMPOSE (may be read-only). Nothing else.
> > > > > >
> > > > > > Once support for scaling is needed (either on the encoder or decoder
> > > > > > side), then the spec should be enhanced. But I prefer to postpone that
> > > > > > until we actually have hardware that needs this.
> > > > > >
> > > > >
> > > > > Okay, let's do it this way then. Actually, I don't even think there is
> > > > > much value in exposing information internal to the bitstream metadata
> > > > > like this, similarly to the coded size. My intention was to just
> > > > > ensure that we can easily add scaling/composing functionality later.
> > > > >
> > > > > I just removed the COMPOSE rectangles from my next draft.
> > > >
> > > > I don't think that supporting scaling will be a problem for the API as
> > > > such, since this is supported for standard video capture devices. It
> > > > just gets very complicated trying to describe how to configure all this.
> > > >
> > > > So I prefer to avoid this until we need to.
> > > >
> > > > > [snip]
> > > > > > > > Changing the OUTPUT format will always fail if OUTPUT buffers are already allocated,
> > > > > > > > or if changing the OUTPUT format would change the CAPTURE format (sizeimage in
> > > > > > > > particular) and CAPTURE buffers were already allocated and are too small.
> > > > > > >
> > > > > > > The OUTPUT format must not change the CAPTURE format by definition.
> > > > > > > Otherwise we end up in a situation where we can't commit, because both
> > > > > > > queue formats can affect each other. Any change to the OUTPUT format
> > > > > > > that wouldn't work with the current CAPTURE format should be adjusted
> > > > > > > by the driver to match the current CAPTURE format.
> > > > > >
> > > > > > But the CAPTURE format *does* depend on the OUTPUT format: if the output
> > > > > > resolution changes, then so does the CAPTURE resolution and esp. the
> > > > > > sizeimage value, since that is typically resolution dependent.
> > > > > >
> > > > > > The coda driver does this as well: changing the output resolution
> > > > > > will update the capture resolution and sizeimage. The vicodec driver does the
> > > > > > same.
> > > > > >
> > > > > > Setting the CAPTURE format basically just selects the codec to use, after
> > > > > > that you can set the OUTPUT format and read the updated CAPTURE format to
> > > > > > get the new sizeimage value. In fact, setting the CAPTURE format shouldn't
> > > > > > change the OUTPUT format, unless the OUTPUT format is incompatible with the
> > > > > > newly selected codec.
> > > > >
> > > > > Let me think about it for a while.
> > > >
> > > > Sleep on it, always works well for me :-)
> > >
> > > Okay, I think I'm not convinced.
> > >
> > > I believe we decided to allow sizeimage to be specified by the
> > > application, because it knows more about the stream it's going to
> > > encode. Only setting the size to 0 would make the encoder fall back to
> > > some simple internal heuristic.
> >
> > Yes, that was the plan, but the patch stalled. I completely forgot
> > about this patch :-)
> >
> > My last reply to "Re: [RFC PATCH] media/doc: Allow sizeimage to be set by
> > v4l clients" was March 14th.
> >
> > Also, sizeimage must be at least the minimum size required for the given
> > CAPTURE width and height. So if it is less, then sizeimage will be set to that
> > minimum size.
> >
> > > Another thing is handling resolution changes. I believe that would
> > > have to be handled by stopping the OUTPUT queue, changing the OUTPUT
> > > format and starting the OUTPUT queue, all that without stopping the
> > > CAPTURE queue. With the behavior you described it wouldn't work,
> > > because the OUTPUT format couldn't be changed.
> > >
> > > I'd suggest making OUTPUT format changes not change the CAPTURE sizeimage.
> >
> > So OUTPUT format changes will still update the CAPTURE width and height?
> >
> > It's kind of weird if you are encoding e.g. 1920x1080 but the CAPTURE format
> > says 1280x720. I'm not sure what is best.
> >
> > What if the CAPTURE sizeimage is too small for the new OUTPUT resolution?
> > Should S_FMT(OUTPUT) fail with some error in that case?
>
> Sounds like we need something similar to the SOURCE_CHANGE event
> mechanism if we want to allow dynamic bitrate control which would
> require re-allocation of the capture buffer queue. (Or any other
> runtime control on our encoders, which is really expected to be
> supported these days).

Sounds like it. Or we could just assume that one needs to stop both
queues to do a resolution change, since most codes would anyway reset
the stream (e.g. send PPS/SPS, etc. for H.264) to change the
resolution. Not sure if that assumption always holds, though.

Best regards,
Tomasz