Re: udev breakages - was: Re: Need of an ".async_probe()" type ofcallback at driver's core - Was: Re: [PATCH] [media] drxk: change it to userequest_firmware_nowait()

From: Greg KH
Date: Wed Oct 03 2012 - 13:12:12 EST


On Wed, Oct 03, 2012 at 04:36:53PM +0200, Kay Sievers wrote:
> On Wed, Oct 3, 2012 at 12:12 AM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> > Mauro, what version of udev are you using that is still showing this
> > issue?
> >
> > Kay, didn't you resolve this already? If not, what was the reason why?
>
> It's the same in the current release, we still haven't wrapped our
> head around how to fix it/work around it.

Ick, as this is breaking people's previously-working machines, shouldn't
this be resolved quickly?

> Unlike what the heated and pretty uncivilized and rude emails here
> claim, udev does not dead-lock or "break" things, it's just "slow".
> The modprobe event handling runs into a ~30 second event timeout.
> Everything is still fully functional though, there's only this delay.

Mauro said it broke the video drivers. Mauro, if you wait 30 seconds,
does everything then "work"?

Not to say that waiting 30 seconds is a correct thing here...

> Udev ensures full dependency resolution between parent and child
> events. Parent events have to finish the event handling and have to
> return, before child event handlers are started. We need to ensure
> such things so that (among other things) disk events have finished
> their operations before the partition events are started, so they can
> rely and access their fully set up parent devices.
>
> What happens here is that the module_init() call blocks in a userspace
> transaction, creating a child event that is not started until the
> parent event has finished. The event handler for modprobe times out
> then the child event loads the firmware.

module_init() can do lots of "bad" things, sleeping, asking for
firmware, and lots of other things. To have userspace block because of
this doesn't seem very wise.

> Having kernel module relying on a running and fully functional
> userspace to return from module_init() is surely a broken driver
> model, at least it's not how things should work. If userspace does not
> respond to firmware requests, module_init() locks up until the
> firmware timeout happens.

But previously this all "just worked" as we ran 'modprobe' in a new
thread/process right? What's wrong with going back to just execing
modprobe and letting that process go off and do what ever it wants to
do? It can't be that "expensive" as modprobe is a very slow thing, and
it should solve this issue. udev will then have handled the 'a device
has shown up, run modprobe' event in the correct order, and then
anything else that the module_init() process wants to do, it can do
without worrying about stopping anything else in the system that might
want to happen at the same time (like load multiple modules in a row).

> This all is not so much about how probe() should behave, it's about a
> fragile dependency on a specific userspace transaction to link a
> loadable module into the kernel. Drivers should avoid such loops for
> many reasons. Also, it's unclear in many cases how such a model should
> work at all if the module is compiled in and initialized when no
> userspace is running.
>
> If that unfortunate module_init() lockup can't be solved properly in
> the kernel, we need to find out if we need to make the modprobe
> handling in udev async, or let firmware events bypass dependency
> resolving. As mentioned, we haven't decided as of now which road to
> take here.

It's not a lockup, there have never been rules about what a driver could
and could not do in its module_init() function. Sure, there are some
not-nice drivers out there, but don't halt the whole system just because
of them.

I recommend making module loading async, like it used to be, and then
all should be fine, right?

That's also the way the mdev works, and I don't think that people have
been having problems there. :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/