[PATCH 0/2] Document memory-to-memory video codec interfaces

From: Tomasz Figa
Date: Tue Jul 24 2018 - 10:06:39 EST


This series attempts to add the documentation of what was discussed
during Media Workshops at LinuxCon Europe 2012 in Barcelona and then
later Embedded Linux Conference Europe 2014 in DÃsseldorf and then
eventually written down by Pawel Osciak and tweaked a bit by Chrome OS
video team (but mostly in a cosmetic way or making the document more
precise), during the several years of Chrome OS using the APIs in
production.

Note that most, if not all, of the API is already implemented in
existing mainline drivers, such as s5p-mfc or mtk-vcodec. Intention of
this series is just to formalize what we already have.

It is an initial conversion from Google Docs to RST, so formatting is
likely to need some further polishing. It is also the first time for me
to create such long RST documention. I could not find any other instance
of similar userspace sequence specifications among our Media documents,
so I mostly followed what was there in the source. Feel free to suggest
a better format.

Much of credits should go to Pawel Osciak, for writing most of the
original text of the initial RFC.

Changes since RFC:
(https://lore.kernel.org/patchwork/project/lkml/list/?series=348588)
- The number of changes is too big to list them all here. Thanks to
a huge number of very useful comments from everyone (Philipp, Hans,
Nicolas, Dave, Stanimir, Alexandre) we should have the interfaces much
more specified now. The issues collected since previous revisions and
answers leading to this revision are listed below.

General issues

Should TRY_/S_FMT really return an error if invalid format is set,
rather than falling back to some valid format? That would be
contradicting to the general spec.

Answer: Keep non-error behavior for existing spec compatibility, but
consider returning error for Request API.

The number of possible opens of M2M video node should not be
artificially limited. Drivers should defer allocating limited resources
(e.g. hardware instances) until initialization is attempted to allow
probing and pre-opening of video nodes. (Hans suggested vb2 queue setup
or REQBUFS.)

Answer: Allocate hardware resources in REQBUFS (or later).

How about colorimetry settings (colorspace, xfer function, etc.)?
Normally it is not needed for decoding itself, but some codecs can parse
it from the stream. If user space can parse by itself, should it set it
on OUTPUT queue? What should happen on CAPTURE queue if colorimetry can
be parsed, colorimetry canât be parsed.

Answer: Mention copying colorimetry from OUTPUT to CAPTURE queue only.
Potentially extend for hardware that can do colorspace conversion later.

Decoder issues

Is VIDIOC_ENUM_FRAMESIZES mandatory? Coda doesnât implement it, s5p-mfc
either.

Answer: Make it mandatory. Otherwise nobody would implement it.

Should we support all the three specification modes of
VIDIOC_ENUM_FRAMESIZES (continuous, discrete and stepwise)? On both
queues?

Answer: Support all 3 size specification modes, not to diverge from
general specification.

Should ENUM_FRAMESIZES return coded or visible size?

Answer: That should be the value that characterizes the stream, so
coded size. Visible size is just a crop.

How should ENUM_FRAMESIZES be affected by profiles and levels?

Answer: Not in current specification - the logic is too complicated and
it might make more sense to actually handle this in user space. (In
theory, level implies supported frame sizes + other factors.)

Is VIDIOC_ENUM_FRAMEINTERVALS mandatory? Coda doesnât implement it,
s5p-mfc either. What is the meaning of frame interval for m2m in
general?

Answer: Do not include in this specification, because there is no way
to return meaningful values for memory-to-memory devices.

What to do for coded formats for which coded resolution canât be parsed
(due to format or hardware limitation)? Current draft mentions setting
them on OUTPUT queue. What would be the effect on CAPTURE queue?
Should OUTPUT queue format include width/height? Would that mean coded
or visible size? If so, should they always be configured? Gstreamer
seems to pass visible size from the container.

Answer: If OUTPUT format has non-zero width and height, the driver must
behave as it instantly parsed the coded size from the stream, including
updating CAPTURE format and queuing source change event. If another
parameters are parsed later by hardware, a dynamic resolution change
sequence would be triggered. However, for hardware not parsing such
parameters from the stream, stateless API should be seriously
considered.

How about the legacy behavior of G_FMT(CAPTURE) blocking until queued
OUTPUT buffers are processed?

Answer: Do not include in the specification, keep in existing drivers for
compatibility.

Should we allow preallocating CAPTURE queue before parsing as an
optimization? If user space allocated buffers bigger than required, it
may be desirable to use them if hardware allows. Similarly, if a
decreasing resolution change happened, it may be desirable to avoid
buffer reallocation. Gstreamer seems to rely on this behavior to be
allowed and works luckily because it allocates resolutions matching what
is parsed later.

Answer: Yes. The client can setup CAPTURE queue beforehand. The driver
would still issue a source change event, but if existing buffers are
compatible with driver requirements (size and count), there is no need to
reallocate. Similarly for dynamic resolution change.

What is the meaning of CAPTURE format? Should it be coded format,
visible format or something else?

Answer: It should be a hardware-specific frame buffer size (>= coded
size), minimum needed for decoding to proceed.

Which selection target should be used for visible rectangle? Should we
also report CROP/COMPOSE_DEFAULT and COMPOSE_PADDED (the area that
hardware actually overwrites)? How about CROP_BOUNDS?

Answer: COMPOSE. Also require most of the other meaningful targets.
Make them default to visible rectangle and, on hardware without
crop/compose/scale ability, read-only.

What if the hardware only supports handling complete frames? Current
draft says that Source OUTPUT buffers must contain: - H.264/AVC: one or
more complete NALUs of an Annex B elementary stream; one buffer does not
have to contain enough data to decode a frame;

Answer: Defer to specification of particular V4L2_PIX_FMT (FourCC), to be
further specified later. Current drivers seem to implement support for
various formats in various ways (especially H264). Moreover, various
userspace applications have their own way of splitting the bitstream. We
need to keep all existing users working, so sorting this out will require
quite a bit of effort and should not be blocking the already de facto
defined part of the specification.

Does the driver need to flush its CAPTURE queue internally when a seek is
issued? Or the client needs to explicitly restart streaming on CAPTURE
queue?

Answer: No guarantees for CAPTURE queue from codec. User space needs to
handle.

Must all drivers support dynamic resolution change? Gstreamer parses the
stream itself and it can handle the change itself by resetting the
decode.

Answer: Yes, if it's a feature of the coded format. There is already
userspace relying on this. A hardware that cannot support this, should
likely use the stateless codec interface.

What happens with OUTPUT queue format (resolution, colorimetry) after
resolution change? Currently always 0 on s5p-mfc. mtk-vcodec reports
coded resolution.

Answer: Coded size on OUTPUT queue.

Can we allow G_FMT(CAPTURE) after resolution change before
REQBUFS(CAPTURE, 0)? This would allow keeping current buffer set if the
resolution decreased.

Answer: Yes, even before STREAMOFF(CAPTURE).

Should the client also read visible resolution after resolution change?
Current draft doesnât mention it.

Answer: Yes.

Is there a requirement or expectation for the encoded data to be framed
as a single encoded frame per buffer, or is feeding in full buffer sized
chunks from a ES valid? It's not stated for the description of
V4L2_PIX_FMT_H264 etc either. Should we tie such requirements to
particular format (FourCC)?

Answer: Defer to specification of particular V4L2_PIX_FMT (FourCC), to be
further specified later. Similarly to the earlier issue with H264.

How about first frame in case of VP8, VP9 or H264_NO_SC? Should that
include only headers?

Answer: There is no separate header in case of VP8 and VP9. There are
only full frames. V4L2_PIX_FMT_H264_NO_SC implies user space splitting
headers (SPS, PPS) and frame data (slice) into separate buffers, due
to the nature of the format.

Should we have a separate format for headers and data in separate
buffers?

Answer: As with the other format-specific issues - defer to format
specification.`

How about timestamp copying between OUTPUT and CAPTURE buffers?
The draft says - buffers may become available on the CAPTURE queue
without additional buffers queued to OUTPUT (e.g. during flush or EOS)
What timestamps would those buffers have?

Answer: Those CAPTURE buffers would originate from an earelier OUTPUT
buffer, just being delayed. Timestamp would match those OUTPUT buffers.

Supposedly there are existing decoders that canât deal with seek to a
non-resume point and end up returning corrupt frames.

Answer: There is userspace relying on this behavior not crashing the
system or causing a fatal decode error. Corrupt frames are okay. We can
extend the specification later with a control that gives a hint to the
client.

Maybe we should state what happens to reference buffers, things like DPB.
Can we CMD_STOP, then V4L2_DEC_CMD_START and continue with the ref kept?

Answer: Refs lost - same as STREAMOFF(CAPTURE), STREAMON(CAPTURE), except
that buffers are successfully returned to user space.

After I streamoff, do I need to send PPS/SPS again after STREAMON, or
will the codec remember, and the following IDR is fine? (ndufresne: For
sure the DPB will be gone)

Answer: Decoder needs to keep PPS/SPS across STREAMOFF(OUTPUT),
STREAMON(OUTPUT). If we seek to another place in the stream that
references the same PPS/SPS, no need to queue the same PPS/SPS again
(since decoder needs to hold it). If we seek somewhere far, skipping
PPS/SPS on the way, we canât guarantee anything. In practice most client
implementations already include PPS/SPS at seek before IDR.

Encoder issues

Is S_FMT() really mandatory during initialization sequence? In theory,
the client could just G_FMT() and use whatâs already there. (tfiga: In
practice unlikely.)

Answer: Not mandatory, but it's the only thing that makes sense.

When does the actual encoding start? Once both queues are streaming?

Answer: When both queues start streaming.

When does the encoding stop/resets? As soon as one queue receives
STREAMOFF?

Answer: STREAMOFF on CAPTURE. After restarting streaming on CAPTURE,
encoder will generate a stream independent of the stream generated
before. E.g. no references frames from before the restart (no H.264 long
term reference), any headers that must be included in a standalone stream
must be produced again. OUTPUT queue might be restarted on demand to
let the client change the buffer set or extended later to support
encoding streams with dynamic resolution changes.

How should we handle hardware that cannot control encoding parameters
dynamically? Should the driver internally stop, reconfigure and restart?
Or should we defer this to user space?

Answer: Disallow setting respective controls when streaming.

Which queue should be master, i.e. be capable of overriding settings on
the other queue?

Answer: CAPTURE, since coded format is likely to determine the list of
supported raw formats.

How should we describe the behavior of two queues?

Answer: Say that standard M2M principles apply. Also mention no direct
relation between order of raw frames being queued and encoded frames
dequeued, other than timestamp.

How should encoder controls be handled?

Answer: Keep up to the driver. Use Request API to set controls for exact
frames.

What should VIDIOC_STREAMON do on an already streaming queue, but after
V4L2_ENC_CMD_STOP?
https://linuxtv.org/downloads/v4l-dvb-apis/uapi/v4l/vidioc-encoder-cmd.html
says A read() or VIDIOC_STREAMON call sends an implicit START command to
the encoder if it has not been started yet.
https://linuxtv.org/downloads/v4l-dvb-apis/uapi/v4l/vidioc-streamon.html
says If VIDIOC_STREAMON is called when streaming is already in progress,
or if VIDIOC_STREAMOFF is called when streaming is already stopped, then
0 is returned. Nothing happens in the case of VIDIOC_STREAMON[...].

Answer: Nothing, as per the specification. Use V4L2_ENC_CMD_START for
resuming from pause.

Tomasz Figa (2):
media: docs-rst: Document memory-to-memory video decoder interface
media: docs-rst: Document memory-to-memory video encoder interface

Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
Documentation/media/uapi/v4l/dev-encoder.rst | 550 ++++++++++++
Documentation/media/uapi/v4l/devices.rst | 2 +
Documentation/media/uapi/v4l/v4l2.rst | 12 +-
4 files changed, 1435 insertions(+), 1 deletion(-)
create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst

--
2.18.0.233.g985f88cf7e-goog