Re: [PATCH v3 2/2] media: docs-rst: Document memory-to-memory video encoder interface

From: Hans Verkuil
Date: Mon Apr 08 2019 - 07:11:18 EST


On 4/8/19 11:23 AM, Tomasz Figa wrote:
> On Fri, Apr 5, 2019 at 7:03 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
>>
>> On 4/5/19 10:12 AM, Tomasz Figa wrote:
>>> On Thu, Mar 14, 2019 at 10:57 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
>>>>
>>>> Hi Tomasz,
>>>>
>>>> Some more comments...
>>>>
>>>> On 1/29/19 2:52 PM, Hans Verkuil wrote:
>>>>> Hi Tomasz,
>>>>>
>>>>> Some comments below. Nothing major, so I think a v4 should be ready to be
>>>>> merged.
>>>>>
>>>>> On 1/24/19 11:04 AM, Tomasz Figa wrote:
>>>>>> Due to complexity of the video encoding process, the V4L2 drivers of
>>>>>> stateful encoder hardware require specific sequences of V4L2 API calls
>>>>>> to be followed. These include capability enumeration, initialization,
>>>>>> encoding, encode parameters change, drain and reset.
>>>>>>
>>>>>> Specifics of the above have been discussed during Media Workshops at
>>>>>> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
>>>>>> Conference Europe 2014 in DÃsseldorf. The de facto Codec API that
>>>>>> originated at those events was later implemented by the drivers we already
>>>>>> have merged in mainline, such as s5p-mfc or coda.
>>>>>>
>>>>>> The only thing missing was the real specification included as a part of
>>>>>> Linux Media documentation. Fix it now and document the encoder part of
>>>>>> the Codec API.
>>>>>>
>>>>>> Signed-off-by: Tomasz Figa <tfiga@xxxxxxxxxxxx>
>>>>>> ---
>>>>>> Documentation/media/uapi/v4l/dev-encoder.rst | 586 ++++++++++++++++++
>>>>>> Documentation/media/uapi/v4l/dev-mem2mem.rst | 1 +
>>>>>> Documentation/media/uapi/v4l/pixfmt-v4l2.rst | 5 +
>>>>>> Documentation/media/uapi/v4l/v4l2.rst | 2 +
>>>>>> .../media/uapi/v4l/vidioc-encoder-cmd.rst | 38 +-
>>>>>> 5 files changed, 617 insertions(+), 15 deletions(-)
>>>>>> create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
>>>>>>
>>>>>> diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
>>>>>> new file mode 100644
>>>>>> index 000000000000..fb8b05a132ee
>>>>>> --- /dev/null
>>>>>> +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
>>>>>> @@ -0,0 +1,586 @@
>>>>>> +.. -*- coding: utf-8; mode: rst -*-
>>>>>> +
>>>>>> +.. _encoder:
>>>>>> +
>>>>>> +*************************************************
>>>>>> +Memory-to-memory Stateful Video Encoder Interface
>>>>>> +*************************************************
>>>>>> +
>>>>>> +A stateful video encoder takes raw video frames in display order and encodes
>>>>>> +them into a bitstream. It generates complete chunks of the bitstream, including
>>>>>> +all metadata, headers, etc. The resulting bitstream does not require any
>>>>>> +further post-processing by the client.
>>>>>> +
>>>>>> +Performing software stream processing, header generation etc. in the driver
>>>>>> +in order to support this interface is strongly discouraged. In case such
>>>>>> +operations are needed, use of the Stateless Video Encoder Interface (in
>>>>>> +development) is strongly advised.
>>>>>> +
>>>>>> +Conventions and notation used in this document
>>>>>> +==============================================
>>>>>> +
>>>>>> +1. The general V4L2 API rules apply if not specified in this document
>>>>>> + otherwise.
>>>>>> +
>>>>>> +2. The meaning of words "must", "may", "should", etc. is as per `RFC
>>>>>> + 2119 <https://tools.ietf.org/html/rfc2119>`_.
>>>>>> +
>>>>>> +3. All steps not marked "optional" are required.
>>>>>> +
>>>>>> +4. :c:func:`VIDIOC_G_EXT_CTRLS` and :c:func:`VIDIOC_S_EXT_CTRLS` may be used
>>>>>> + interchangeably with :c:func:`VIDIOC_G_CTRL` and :c:func:`VIDIOC_S_CTRL`,
>>>>>> + unless specified otherwise.
>>>>>> +
>>>>>> +5. Single-planar API (see :ref:`planar-apis`) and applicable structures may be
>>>>>> + used interchangeably with multi-planar API, unless specified otherwise,
>>>>>> + depending on decoder capabilities and following the general V4L2 guidelines.
>>>>>> +
>>>>>> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
>>>>>> + [0..2]: i = 0, 1, 2.
>>>>>> +
>>>>>> +7. Given an ``OUTPUT`` buffer A, then Aâ represents a buffer on the ``CAPTURE``
>>>>>> + queue containing data that resulted from processing buffer A.
>>>>>> +
>>>>>> +Glossary
>>>>>> +========
>>>>>> +
>>>>>> +Refer to :ref:`decoder-glossary`.
>>>>>> +
>>>>>> +State machine
>>>>>> +=============
>>>>>> +
>>>>>> +.. kernel-render:: DOT
>>>>>> + :alt: DOT digraph of encoder state machine
>>>>>> + :caption: Encoder state machine
>>>>>> +
>>>>>> + digraph encoder_state_machine {
>>>>>> + node [shape = doublecircle, label="Encoding"] Encoding;
>>>>>> +
>>>>>> + node [shape = circle, label="Initialization"] Initialization;
>>>>>> + node [shape = circle, label="Stopped"] Stopped;
>>>>>> + node [shape = circle, label="Drain"] Drain;
>>>>>> + node [shape = circle, label="Reset"] Reset;
>>>>>> +
>>>>>> + node [shape = point]; qi
>>>>>> + qi -> Initialization [ label = "open()" ];
>>>>>> +
>>>>>> + Initialization -> Encoding [ label = "Both queues streaming" ];
>>>>>> +
>>>>>> + Encoding -> Drain [ label = "V4L2_DEC_CMD_STOP" ];
>>>>>> + Encoding -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ];
>>>>>> + Encoding -> Stopped [ label = "VIDIOC_STREAMOFF(OUTPUT)" ];
>>>>>> + Encoding -> Encoding;
>>>>>> +
>>>>>> + Drain -> Stopped [ label = "All CAPTURE\nbuffers dequeued\nor\nVIDIOC_STREAMOFF(CAPTURE)" ];
>>>>>> + Drain -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ];
>>>>>> +
>>>>>> + Reset -> Encoding [ label = "VIDIOC_STREAMON(CAPTURE)" ];
>>>>>> + Reset -> Initialization [ label = "VIDIOC_REQBUFS(OUTPUT, 0)" ];
>>>>>> +
>>>>>> + Stopped -> Encoding [ label = "V4L2_DEC_CMD_START\nor\nVIDIOC_STREAMON(OUTPUT)" ];
>>>>>> + Stopped -> Reset [ label = "VIDIOC_STREAMOFF(CAPTURE)" ];
>>>>>> + }
>>>>>> +
>>>>>> +Querying capabilities
>>>>>> +=====================
>>>>>> +
>>>>>> +1. To enumerate the set of coded formats supported by the encoder, the
>>>>>> + client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
>>>>>> +
>>>>>> + * The full set of supported formats will be returned, regardless of the
>>>>>> + format set on ``OUTPUT``.
>>>>>> +
>>>>>> +2. To enumerate the set of supported raw formats, the client may call
>>>>>> + :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
>>>>>> +
>>>>>> + * Only the formats supported for the format currently active on ``CAPTURE``
>>>>>> + will be returned.
>>>>>> +
>>>>>> + * In order to enumerate raw formats supported by a given coded format,
>>>>>> + the client must first set that coded format on ``CAPTURE`` and then
>>>>>> + enumerate the formats on ``OUTPUT``.
>>>>>> +
>>>>>> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
>>>>>> + resolutions for a given format, passing desired pixel format in
>>>>>> + :c:type:`v4l2_frmsizeenum` ``pixel_format``.
>>>>>> +
>>>>>> + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a coded pixel
>>>>>> + format will include all possible coded resolutions supported by the
>>>>>> + encoder for given coded pixel format.
>>>>>> +
>>>>>> + * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` for a raw pixel format
>>>>>> + will include all possible frame buffer resolutions supported by the
>>>>>> + encoder for given raw pixel format and coded format currently set on
>>>>>> + ``CAPTURE``.
>>>>>> +
>>>>>> +4. Supported profiles and levels for the coded format currently set on
>>>>>> + ``CAPTURE``, if applicable, may be queried using their respective controls
>>>>>> + via :c:func:`VIDIOC_QUERYCTRL`.
>>>>>> +
>>>>>> +5. Any additional encoder capabilities may be discovered by querying
>>>>>> + their respective controls.
>>>>>> +
>>>>>> +Initialization
>>>>>> +==============
>>>>>> +
>>>>>> +1. Set the coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT`
>>>>>> +
>>>>>> + * **Required fields:**
>>>>>> +
>>>>>> + ``type``
>>>>>> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
>>>>>> +
>>>>>> + ``pixelformat``
>>>>>> + the coded format to be produced
>>>>>> +
>>>>>> + ``sizeimage``
>>>>>> + desired size of ``CAPTURE`` buffers; the encoder may adjust it to
>>>>>> + match hardware requirements
>>>>>> +
>>>>>> + ``width``, ``height``
>>>>>> + ignored (always zero)
>>>>>> +
>>>>>> + other fields
>>>>>> + follow standard semantics
>>>>>> +
>>>>>> + * **Return fields:**
>>>>>> +
>>>>>> + ``sizeimage``
>>>>>> + adjusted size of ``CAPTURE`` buffers
>>>>>> +
>>>>>> + .. important::
>>>>>> +
>>>>>> + Changing the ``CAPTURE`` format may change the currently set ``OUTPUT``
>>>>>> + format. The encoder will derive a new ``OUTPUT`` format from the
>>>>>> + ``CAPTURE`` format being set, including resolution, colorimetry
>>>>>> + parameters, etc. If the client needs a specific ``OUTPUT`` format, it
>>>>>> + must adjust it afterwards.
>>>>>
>>>>> Hmm, "including resolution": if width and height are set to 0, what should the
>>>>> OUTPUT resolution be? Up to the driver? I think this should be clarified since
>>>>> at a first reading of this paragraph it appears to be contradictory.
>>>>
>>>> I think the driver should just return the width and height of the OUTPUT
>>>> format. So the width and height that userspace specifies is just ignored
>>>> and replaced by the width and height of the OUTPUT format. After all, that's
>>>> what the bitstream will encode. Returning 0 for width and height would make
>>>> this a strange exception in V4L2 and I want to avoid that.
>>>>
>>>
>>> Hmm, however, the width and height of the OUTPUT format is not what's
>>> actually encoded in the bitstream. The right selection rectangle
>>> determines that.
>>>
>>> In one of the previous versions I though we could put the codec
>
> s/codec/coded/...
>
>>> resolution as the width and height of the CAPTURE format, which would
>>> be the resolution of the encoded image rounded up to full macroblocks
>>> +/- some encoder-specific constraints. AFAIR there was some concern
>>> about OUTPUT format changes triggering CAPTURE format changes, but to
>>> be honest, I'm not sure if that's really a problem. I just decided to
>>> drop that for the simplicity.
>>
>> I'm not sure what your point is.
>>
>> The OUTPUT format has the coded resolution,
>
> That's not always true. The OUTPUT format is just the format of the
> source frame buffers. In special cases where the source resolution is
> nicely aligned, it would be the same as coded size, but the remaining
> cases are valid as well.
>
>> so when you set the
>> CAPTURE format it can just copy the OUTPUT coded resolution unless the
>> chosen CAPTURE pixelformat can't handle that in which case both the
>> OUTPUT and CAPTURE coded resolutions are clamped to whatever is the maximum
>> or minimum the codec is capable of.
>
> As per my comment above, generally speaking, the encoder will derive
> an appropriate coded format from the OUTPUT format, but also other
> factors, like the crop rectangles and possibly some internal
> constraints.
>
>>
>> That said, I am fine with just leaving it up to the driver as suggested
>> before. Just as long as both the CAPTURE and OUTPUT formats remain valid
>> (i.e. width and height may never be out of range).
>>
>
> Sounds good to me.
>
>>>
>>>>>
>>>>>> +
>>>>>> +2. **Optional.** Enumerate supported ``OUTPUT`` formats (raw formats for
>>>>>> + source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`.
>>>>>> +
>>>>>> + * **Required fields:**
>>>>>> +
>>>>>> + ``type``
>>>>>> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
>>>>>> +
>>>>>> + other fields
>>>>>> + follow standard semantics
>>>>>> +
>>>>>> + * **Return fields:**
>>>>>> +
>>>>>> + ``pixelformat``
>>>>>> + raw format supported for the coded format currently selected on
>>>>>> + the ``CAPTURE`` queue.
>>>>>> +
>>>>>> + other fields
>>>>>> + follow standard semantics
>>>>>> +
>>>>>> +3. Set the raw source format on the ``OUTPUT`` queue via
>>>>>> + :c:func:`VIDIOC_S_FMT`.
>>>>>> +
>>>>>> + * **Required fields:**
>>>>>> +
>>>>>> + ``type``
>>>>>> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
>>>>>> +
>>>>>> + ``pixelformat``
>>>>>> + raw format of the source
>>>>>> +
>>>>>> + ``width``, ``height``
>>>>>> + source resolution
>>>>>> +
>>>>>> + other fields
>>>>>> + follow standard semantics
>>>>>> +
>>>>>> + * **Return fields:**
>>>>>> +
>>>>>> + ``width``, ``height``
>>>>>> + may be adjusted by encoder to match alignment requirements, as
>>>>>> + required by the currently selected formats
>>>>>
>>>>> What if the width x height is larger than the maximum supported by the
>>>>> selected coded format? This should probably mention that in that case the
>>>>> width x height is reduced to the largest allowed value. Also mention that
>>>>> this maximum is reported by VIDIOC_ENUM_FRAMESIZES.
>>>>>
>>>>>> +
>>>>>> + other fields
>>>>>> + follow standard semantics
>>>>>> +
>>>>>> + * Setting the source resolution will reset the selection rectangles to their
>>>>>> + default values, based on the new resolution, as described in the step 5
>>>>>
>>>>> 5 -> 4
>>>>>
>>>>> Or just say: "as described in the next step."
>>>>>
>>>>>> + below.
>>>>
>>>> It should also be made explicit that:
>>>>
>>>> 1) the crop rectangle will be set to the given width and height *before*
>>>> it is being adjusted by S_FMT.
>>>>
>>>
>>> I don't think that's what we want here.
>>>
>>> Defining the default rectangle to be exactly the same as the OUTPUT
>>> resolution (after the adjustment) makes the semantics consistent - not
>>> setting the crop rectangle gives you exactly the behavior as if there
>>> was no cropping involved (or supported by the encoder).
>>
>> I think you are right. This seems to be what the coda driver does as well.
>> It is convenient to be able to just set a 1920x1080 format and have that
>> resolution be stored as the crop rectangle, since it avoids having to call
>> s_selection afterwards, but it is not really consistent with the way V4L2
>> works.
>>
>>>
>>>> Open question: should we support a compose rectangle for the CAPTURE that
>>>> is the same as the OUTPUT crop rectangle? I.e. the CAPTURE format contains
>>>> the adjusted width and height and the compose rectangle (read-only) contains
>>>> the visible width and height. It's not strictly necessary, but it is
>>>> symmetrical.
>>>
>>> Wouldn't it rather be the CAPTURE crop rectangle that would be of the
>>> same resolution of the OUTPUT compose rectangle? Then you could
>>> actually have the CAPTURE compose rectangle for putting that into the
>>> desired rectangle of the encoded stream, if the encoder supports that.
>>> (I don't know any that does, so probably out of concern for now.)
>>
>> Yes, you are right.
>>
>> But should we support this?
>>
>> I actually think not for this initial version. It can be added later, I guess.
>>
>
> I think it boils down on whether adding it later wouldn't
> significantly complicate the application logic. It also relates to my
> other comment somewhere below.
>
>>>
>>>>
>>>> 2) the CAPTURE format will be updated as well with the new OUTPUT width and
>>>> height. The CAPTURE sizeimage might change as well.
>>>>
>>>>>> +
>>>>>> +4. **Optional.** Set the visible resolution for the stream metadata via
>>>>>> + :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
>>>>
>>>> I think you should mention that this is only necessary if the crop rectangle
>>>> that is set when you set the format isn't what you want.
>>>>
>>>
>>> Ack.
>>>
>>>>>> +
>>>>>> + * **Required fields:**
>>>>>> +
>>>>>> + ``type``
>>>>>> + a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
>>>>>> +
>>>>>> + ``target``
>>>>>> + set to ``V4L2_SEL_TGT_CROP``
>>>>>> +
>>>>>> + ``r.left``, ``r.top``, ``r.width``, ``r.height``
>>>>>> + visible rectangle; this must fit within the `V4L2_SEL_TGT_CROP_BOUNDS`
>>>>>> + rectangle and may be subject to adjustment to match codec and
>>>>>> + hardware constraints
>>>>>> +
>>>>>> + * **Return fields:**
>>>>>> +
>>>>>> + ``r.left``, ``r.top``, ``r.width``, ``r.height``
>>>>>> + visible rectangle adjusted by the encoder
>>>>>> +
>>>>>> + * The following selection targets are supported on ``OUTPUT``:
>>>>>> +
>>>>>> + ``V4L2_SEL_TGT_CROP_BOUNDS``
>>>>>> + equal to the full source frame, matching the active ``OUTPUT``
>>>>>> + format
>>>>>> +
>>>>>> + ``V4L2_SEL_TGT_CROP_DEFAULT``
>>>>>> + equal to ``V4L2_SEL_TGT_CROP_BOUNDS``
>>>>>> +
>>>>>> + ``V4L2_SEL_TGT_CROP``
>>>>>> + rectangle within the source buffer to be encoded into the
>>>>>> + ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
>>>>>> +
>>>>>> + .. note::
>>>>>> +
>>>>>> + A common use case for this selection target is encoding a source
>>>>>> + video with a resolution that is not a multiple of a macroblock,
>>>>>> + e.g. the common 1920x1080 resolution may require the source
>>>>>> + buffers to be aligned to 1920x1088 for codecs with 16x16 macroblock
>>>>>> + size. To avoid encoding the padding, the client needs to explicitly
>>>>>> + configure this selection target to 1920x1080.
>>>>
>>>> This last sentence contradicts the proposed behavior of S_FMT(OUTPUT).
>>>>
>>>
>>> Sorry, which part exactly and what part of the proposal exactly? :)
>>> (My comment above might be related, though.)
>>
>> Ignore my comment. We go back to explicitly requiring userspace to set the OUTPUT
>> crop selection target, so this note remains valid.
>>
>
> Ack.
>
>>>
>>>>>> +
>>>>>> + ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
>>>>>> + maximum rectangle within the coded resolution, which the cropped
>>>>>> + source frame can be composed into; if the hardware does not support
>>>>>> + composition or scaling, then this is always equal to the rectangle of
>>>>>> + width and height matching ``V4L2_SEL_TGT_CROP`` and located at (0, 0)
>>>>>> +
>>>>>> + ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
>>>>>> + equal to a rectangle of width and height matching
>>>>>> + ``V4L2_SEL_TGT_CROP`` and located at (0, 0)
>>>>>> +
>>>>>> + ``V4L2_SEL_TGT_COMPOSE``
>>>>>> + rectangle within the coded frame, which the cropped source frame
>>>>>> + is to be composed into; defaults to
>>>>>> + ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
>>>>>> + additional compose/scaling capabilities; resulting stream will
>>>>>> + have this rectangle encoded as the visible rectangle in its
>>>>>> + metadata
>>>>
>>>> I think the compose targets for OUTPUT are only needed if the hardware can
>>>> actually do scaling and/or composition. Otherwise they can (must?) be
>>>> dropped.
>>>>
>>>
>>> Note that V4L2_SEL_TGT_COMPOSE is defined to be the way for the
>>> userspace to learn the target visible rectangle that's going to be
>>> encoded in the stream metadata. If we omit it, we wouldn't have a way
>>> that would be consistent between encoders that can do
>>> scaling/composition and those that can't.
>>
>> I'm not convinced about this. The standard API behavior is not to expose
>> functionality that the hardware can't do. So if scaling isn't possible on
>> the OUTPUT side, then it shouldn't expose OUTPUT compose rectangles.
>>
>> I also believe it very unlikely that we'll see encoders capable of scaling
>> as it doesn't make much sense.
>
> It does make a lot of sense - WebRTC requires 3 different sizes of the
> stream to be encoded at the same time. However, unfortunately, I
> haven't yet seen an encoder capable of doing so.
>
>> I would prefer to drop this to simplify the
>> spec, and when we get encoders that can scale, then we can add support for
>> compose rectangles (and I'm sure we'll need to think about how that
>> influences the CAPTURE side as well).
>>
>> For encoders without scaling it is the OUTPUT crop rectangle that defines
>> the visible rectangle.
>>
>>>
>>> However, with your proposal of actually having selection rectangles
>>> for the CAPTURE queue, it could be solved indeed. The OUTPUT queue
>>> would expose a varying set of rectangles, depending on the hardware
>>> capability, while the CAPTURE queue would always expose its rectangle
>>> with that information.
>>
>> I think we should keep it simple and only define selection rectangles
>> when really needed.
>>
>> So encoders support CROP on the OUTPUT, and decoders support CAPTURE
>> COMPOSE (may be read-only). Nothing else.
>>
>> Once support for scaling is needed (either on the encoder or decoder
>> side), then the spec should be enhanced. But I prefer to postpone that
>> until we actually have hardware that needs this.
>>
>
> Okay, let's do it this way then. Actually, I don't even think there is
> much value in exposing information internal to the bitstream metadata
> like this, similarly to the coded size. My intention was to just
> ensure that we can easily add scaling/composing functionality later.
>
> I just removed the COMPOSE rectangles from my next draft.

I don't think that supporting scaling will be a problem for the API as
such, since this is supported for standard video capture devices. It
just gets very complicated trying to describe how to configure all this.

So I prefer to avoid this until we need to.

>
> [snip]
>>>
>>>>
>>>> Changing the OUTPUT format will always fail if OUTPUT buffers are already allocated,
>>>> or if changing the OUTPUT format would change the CAPTURE format (sizeimage in
>>>> particular) and CAPTURE buffers were already allocated and are too small.
>>>
>>> The OUTPUT format must not change the CAPTURE format by definition.
>>> Otherwise we end up in a situation where we can't commit, because both
>>> queue formats can affect each other. Any change to the OUTPUT format
>>> that wouldn't work with the current CAPTURE format should be adjusted
>>> by the driver to match the current CAPTURE format.
>>
>> But the CAPTURE format *does* depend on the OUTPUT format: if the output
>> resolution changes, then so does the CAPTURE resolution and esp. the
>> sizeimage value, since that is typically resolution dependent.
>>
>> The coda driver does this as well: changing the output resolution
>> will update the capture resolution and sizeimage. The vicodec driver does the
>> same.
>>
>> Setting the CAPTURE format basically just selects the codec to use, after
>> that you can set the OUTPUT format and read the updated CAPTURE format to
>> get the new sizeimage value. In fact, setting the CAPTURE format shouldn't
>> change the OUTPUT format, unless the OUTPUT format is incompatible with the
>> newly selected codec.
>
> Let me think about it for a while.

Sleep on it, always works well for me :-)

Regards,

Hans