Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface

From: Philipp Zabel
Date: Mon Aug 20 2018 - 11:34:04 EST


On Mon, 2018-08-20 at 23:27 +0900, Tomasz Figa wrote:
[...]
> +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> > > > > + the seek until a suitable resume point is found.
> > > > > +
> > > > > + .. note::
> > > > > +
> > > > > + There is no requirement to begin queuing stream starting exactly from
> > > > > + a resume point (e.g. SPS or a keyframe). The driver must handle any
> > > > > + data queued and must keep processing the queued buffers until it
> > > > > + finds a suitable resume point. While looking for a resume point, the
> > > >
> > > > I think the definition of a resume point is too vague in this place.
> > > > Can the driver decide whether or not a keyframe without SPS is a
> > > > suitable resume point? Or do drivers have to parse and store SPS/PPS if
> > > > the hardware does not support resuming from a keyframe without sending
> > > > SPS/PPS again?
> > >
> > > The thing is that existing drivers implement and user space clients
> > > rely on the behavior described above, so we cannot really change it
> > > anymore.
> >
> > My point is that I'm not exactly sure what that behaviour is, given the
> > description.
> >
> > Must a driver be able to resume from a keyframe even if userspace never
> > pushes SPS/PPS again?
> > If so, I think it should be mentioned more explicitly than just via an
> > example in parentheses, to make it clear to all driver developers that
> > this is a requirement that userspace is going to rely on.
> >
> > Or, if that is not the case, is a driver free to define "SPS only" as
> > its "suitable resume point" and to discard all input including keyframes
> > until the next SPS/PPS is pushed?
> >
> > It would be better to clearly define what a "suitable resume point" has
> > to be per codec, and not let the drivers decide for themselves, if at
> > all possible. Otherwise we'd need a away to inform userspace about the
> > per-driver definition.
>
> The intention here is that there is exactly no requirement for the
> user space to seek to any kind of resume point

No question about this.

> and so there is no point in defining such.

I don't agree.ÂLet me give an example:

Assume userspace wants to play back a simple h.264 stream that has
SPS/PPS exactly once, in the beginning.

If drivers are allowed to resume from SPS/PPS only, and have no way to
communicate this to userspace, userspace always has to assume that
resuming from keyframes alone is not possible. So it has to store
SPS/PPS and resubmit them with every seek, even if a specific driver
wouldn't require it: Otherwise those drivers that don't store SPS/PPS
themselves (or in hardware) would be allowed to just drop everything
after the first seek.
This effectively would make resending SPS/PPS mandatory, which doesn't
fit well with the intention of letting userspace just seek anywhere and
start feeding data (or: NAL units) into the driver blindly.

> The only requirement here is that the
> hardware/driver keeps processing the source stream until it finds a
> resume point suitable for it - if the hardware keeps SPS/PPS in its
> state then just a keyframe; if it doesn't then SPS/PPS.

Yes, but the difference between those two might be very relevant to
userspace behaviour.

> Note that thisÂis a documentation of the user space API, not a driver
> implementation guide. We may want to create the latter separately,
> though.

This is a good point, I keep switching the perspective from which I look
at this document.
Even for userspace it would make sense to be as specific as possible,
though. Otherwise, doesn't userspace always have to assume the worst?

> H264 is a bit special here, because one may still seek to a key frame,
> but past the relevant SPS/PPS headers. In this case, there is no way
> for the hardware to know that the SPS/PPS it has in its local state is
> not the one that applies to the frame. It may be worth adding that
> such case leads to undefined results, but must not cause crash nor a
> fatal decode error.
>
> What do you think?

That sounds like a good idea. I haven't thought about seeking over a
SPS/PPS change. Of course userspace must not expect correct results in
this case without providing the new SPS/PPS.

> > > Do we have hardware for which this wouldn't work to the point that the
> > > driver couldn't even continue with a bunch of frames corrupted? If
> > > only frame corruption is a problem, we can add a control to tell the
> > > user space to seek to resume points and it can happen in an
> > > incremental patch.
> >
> > The coda driver currently can't seek at all, it always stops and
> > restarts the sequence. So depending on the above I might have to either
> > find and store SPS/PPS in software, or figure out how to make the
> > firmware flush the bitstream buffer and restart without actually
> > stopping the sequence.
> > I'm sure the hardware is capable of this, it's more a question of what
> > behaviour is actually intended, and whether I have enough information
> > about the firmware interface to implement it.
>
> What happens if you just keep feeding it with next frames?

As long as they are well formed, it should just decode them, possibly
with artifacts due to mismatched reference buffers. There is an I-Frame
search mode that should be usable to skip to the next resume point, as
well, so I'm sure coda will end up not needing the
NEEDS_SEEK_TO_RESUME_POINT flag below. I'm just not certain at this
point whether I'll be able to (or: whether I'll have to) keep the
SPS/PPS state across seeks. I have seen so many decoder hangs with
malformed input on i.MX53 that I couldn't recover from, that I'm wary
to make any guarantees without flushing the bitstream buffer first.

> If that would result only in corrupted frames, I suppose the control (say
> V4L2_CID_MPEG_VIDEO_NEEDS_SEEK_TO_RESUME_POINT) would solve the
> problem?

For this to be useful, userspace needs to know what a resume point is in
the first place, though.

regards
Philipp