Re: [PATCH 0/2] media: intel-ipu3: allow the media graph to be used even if a subdev fails

From: Tomasz Figa
Date: Wed Nov 14 2018 - 03:28:21 EST


Hi Hans,

On Thu, Sep 27, 2018 at 7:22 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
>
> On 09/27/2018 12:13 PM, Mauro Carvalho Chehab wrote:
> > Em Thu, 27 Sep 2018 11:52:35 +0200
> > Hans Verkuil <hverkuil@xxxxxxxxx> escreveu:
> >
> >> Hi Javier,
> >>
> >> On 09/04/2018 01:30 PM, Javier Martinez Canillas wrote:
> >>> Hello,
> >>>
> >>> This series allows the ipu3-cio2 driver to properly expose a subset of the
> >>> media graph even if some drivers for the pending subdevices fail to probe.
> >>>
> >>> Currently the driver exposes a non-functional graph since the pad links are
> >>> created and the subdev dev nodes are registered in the v4l2 async .complete
> >>> callback. Instead, these operations should be done in the .bound callback.
> >>>
> >>> Patch #1 just adds a v4l2_device_register_subdev_node() function to allow
> >>> registering a single device node for a subdev of a v4l2 device.
> >>>
> >>> Patch #2 moves the logic of the ipu3-cio2 .complete callback to the .bound
> >>> callback. The .complete callback is just removed since is empy after that.
> >>
> >> Sorry, I missed this series until you pointed to it on irc just now :-)
> >>
> >> I have discussed this topic before with Sakari and Laurent. My main problem
> >> with this is how an application can discover that not everything is online?
> >> And which parts are offline?
> >
> > Via the media controller? It should be possible for an application to see
> > if a videonode is missing using it.
> >
> >> Perhaps a car with 10 cameras can function with 9, but not with 8. How would
> >> userspace know?
> >
> > I guess this is not the only case where someone submitted a patch for
> > a driver that would keep working if some device node registration fails.
> >
> > It could be just dÃjà vu, but I have a vague sensation that I merged something
> > similar to it in the past on another driver, but I can't remember any details.
> >
> >>
> >> I completely agree that we need to support these advanced scenarios (including
> >> what happens when a camera suddenly fails), but it is the userspace aspects
> >> for which I would like to see an RFC first before you can do these things.
> >
> > Dynamic runtime fails should likely rise some signal. Perhaps a sort of
> > media controller event?
>
> See this old discussion: https://patchwork.kernel.org/patch/9849317/
>
> My point is that someone needs to think about this and make a proposal.
> There may well be a simple approach, but it needs to be specced first.

In that thread, you seem to have mentioned that having a Kconfig
option, disabled by default, to allow registering an incomplete media
topology would be an acceptable option. Do you think we could revive
that idea?

Quoting some of the discussion points you mentioned:

Some discussion points:

> 1) What about adding time-out support? Today we wait forever until all components
> are found, but wouldn't it make sense to optionally time-out? And if so, then
> we can keep most of the code the same and decide in the complete() callback
> whether or not we accept missing components. And decide how badly 'impaired'
> the system is. We can also still bring up all the devices in the complete rather
> than one-by-one as you proposed (and which I am not sure I like).

It sounds like an interesting extension, not a must have for handling
incomplete topologies.

I can also imagine the timeout handling introducing a lot of confusion
to the userspace, for example, with a long timeout, the whole
initialization would have to wait for the timeout to elapse, which for
a smartphone user could mean that they can't start the camera
application (or it times out on its own and fails) until then.

> 2) This can be hard to test, so perhaps we should support some form of error
> injection to easily test what happens if something doesn't come up.

This is a very good point, but I think we should be able to do away
with something simpler, just blacklist particular device from being
bound to a driver. How to do it, is another question, though... (I can
imagine binding a dummy driver to the device as an example solution.)

Best regards,
Tomasz