Re: [PATCH v5 15/39] [media] v4l2: add a frame interval error event

From: Steve Longerbeam
Date: Mon Mar 13 2017 - 17:48:02 EST

On 03/13/2017 10:10 AM, Hans Verkuil wrote:
On 03/13/2017 06:06 PM, Steve Longerbeam wrote:

On 03/13/2017 03:53 AM, Hans Verkuil wrote:
On 03/13/2017 11:45 AM, Russell King - ARM Linux wrote:
On Mon, Mar 13, 2017 at 11:02:34AM +0100, Hans Verkuil wrote:
On 03/11/2017 07:14 PM, Steve Longerbeam wrote:
The event must be user visible, otherwise the user has no indication
the error, and can't correct it by stream restart.

In that case the driver can detect this and call vb2_queue_error. It's
what it is there for.

The event doesn't help you since only this driver has this issue. So nobody
will watch this event, unless it is sw specifically written for this SoC.

Much better to call vb2_queue_error to signal a fatal error (which this
apparently is) since there are more drivers that do this, and vivid supports
triggering this condition as well.

So today, I can fiddle around with the IMX219 registers to help gain
an understanding of how this sensor works. Several of the registers
(such as the PLL setup [*]) require me to disable streaming on the
sensor while changing them.

This is something I've done many times while testing various ideas,
and is my primary way of figuring out and testing such things.

Whenever I resume streaming (provided I've let the sensor stop
streaming at a frame boundary) it resumes as if nothing happened. If I
stop the sensor mid-frame, then I get the rolling issue that Steve
reports, but once the top of the frame becomes aligned with the top of
the capture, everything then becomes stable again as if nothing happened.

The side effect of what you're proposing is that when I disable streaming
at the sensor by poking at its registers, rather than the capture just
stopping, an error is going to be delivered to gstreamer, and gstreamer
is going to exit, taking the entire capture process down.

This severely restricts the ability to be able to develop and test
sensor drivers.

So, I strongly disagree with you.

Loss of capture frames is not necessarily a fatal error - as I have been
saying repeatedly. In Steve's case, there's some unknown interaction
between the source and iMX6 hardware that is causing the instability,
but that is simply not true of other sources, and I oppose any idea that
we should cripple the iMX6 side of the capture based upon just one
hardware combination where this is a problem.

Steve suggested that the problem could be in the iMX6 CSI block - and I
note comparing Steve's code with the code in FSL's repository that there
are some changes that are missing in Steve's code to do with the CCIR656
sync code setup, particularly for >8 bit. The progressive CCIR656 8-bit
setup looks pretty similar though - but I think what needs to be asked
is whether the same problem is visible using the FSL/NXP vendor kernel.

* - the PLL setup is something that requires research at the moment.
Sony's official position (even to their customers) is that they do not
supply the necessary information, instead they expect customers to tell
them the capture settings they want, and Sony will throw the values into
a spreadsheet, and they'll supply the register settings back to the
customer. Hence, the only way to proceed with a generic driver for
this sensor is to experiment, and experimenting requires the ability to
pause the stream at the sensor while making changes. Take this away,
and we're stuck with the tables-of-register-settings-for-set-of-fixed-
capture-settings approach. I've made a lot of progress away from this
which is all down to the flexibility afforded by _not_ killing the
capture process.

In other words: Steve should either find a proper fix for this, or only
call vb2_queue_error in this specific case. Sending an event that nobody
will know how to handle or what to do with is pretty pointless IMHO.

Let's just give him time to try and figure out the real issue here.

This is a long-standing issue, I've traveled to Hildesheim working with
our customer to try and get to the bottom of it. I can go into a lot of
details from those trips, we probed the bt.656 bus with a logic analyzer
and I can share those results with anyone who asks. But the results of
those investigations indicate the CSI is not handling the SAV/EAV sync
codes correctly - if there is a shift in the line position at which
those codes occur, the CSI/IPU does not abort the frame capture DMA
and start from the new sync code position, it just continues to capture
lines until the programmed number of lines are transferred, hence you
get these split images. Freescale also informed us of a mechanism in the
IPU that will add lines if it detects these short frames, until the
programmed number of lines are reached. Apparently that is what creates
the rolling effect, but this rolling can last for up to a full minute,
which is completely unacceptable, it must be corrected as soon as

So the only thing we could come up with was to monitor frame intervals,
this is purely empirical, but we observed that frame intervals drop
by approx. one line time (~60 usec) when these short frames are
received. I don't really have any explanation for that but we take
advantage of that observation by sending this event to userspace so
the problem can be corrected immediately with a stream restart.

As I've said, the ADV718x does not provide _any_ indication via status
when the shift in the sync code position occurs. And the IPU is also
severely lacking in DMA completion status as well (no short packet
status like in USB for example). So the only way to detect this event
is by monitoring the frame intervals.

I will review differences in the CCIR code register setup from FSL's
repo that Russell pointed out, but I'm fairly sure those code registers
are setup correctly, there's not much room for variability in those
values. They only define the values of the sync codes, so the CSI can
detect them, those values are defined by the bt.656 spec.

Anyway, perhaps for now I can remove the event, but keep the FI
monitoring, and for now just report a kernel error message on
a detected bad FI.

Is it possible to detect this specific situation? Apparently this issue
is not present (or at least resolves itself very quickly) on the imx219.

The imx219 is not a BT.656 interface, it is MIPI CSI-2, so this is
not an issue for imx219. It is an issue for any sensor with a
parallel BT.656 interface.

If it is not possible, then a private event would be acceptable, but
needs to be carefully documented. After all, it is a workaround for a bug,
since otherwise there would be no need for this event.

Ok, sounds like a plan, I'll keep the event as a private event and make
sure it is documented well. Yes it is a workaround, for a silicon bug,
although Freescale/NXP has not issued an errata for it yet AFAIK. THat
might be because they claim to handle it via this adding lines
mechanism, but that mechanism doesn't work well (long duration rolling),
or not at all (permanent split images).