Re: [PATCH v2 2/4] driver core: enable drivers to use deferred probe from init

From: Luis R. Rodriguez
Date: Mon Jul 28 2014 - 20:27:09 EST


On Mon, Jul 28, 2014 at 4:46 PM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Jul 28, 2014 at 12:48:32PM -0700, Luis R. Rodriguez wrote:
>> On Mon, Jul 28, 2014 at 12:04 PM, Luis R. Rodriguez
>> <mcgrof@xxxxxxxxxxxxxxxx> wrote:
>> > On Mon, Jul 28, 2014 at 11:55 AM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>> >> So, what drivers are having problems in their init sequence, and why
>> >> aren't they using async firmware loading?
>> >
>> > Fixing drivers is one thing, fixing drivers *now* because *now*
>> > drivers are failing due to a regression is another thing and that's
>> > what we have now so lets just be clear on that. The 30 second rule is
>> > a major driver requirement change and it should not be taken slightly,
>> > all of a sudden this is a new requirement but you won't know that
>> > unless you're reading these threads or hit an issue. That's an issue
>> > in itself.
>
> That "regression" is something that userspace has decided to do, not
> anything the kernel changed,

Actually commit 786235ee seems to have been the one that caused this
issue, systemd would just send the SIGKILL and that change forced a
bail on probe then hence Canonical's work around to modify
kthread_create() to not leave upon SIGKILL:

http://thread.gmane.org/gmane.linux.ubuntu.devel.kernel.general/39123

> so perhaps you should just patch your
> modprobe and be done with it until you can fix up those drivers?

To ignore SIGKILL ?

> And putting a horrid hack in the driver core, just because of some
> really bad drivers, is not ok, that's an interface _I_ will have to
> support for the next few decades.

I understand, hence review.

> And it's going to take you a while to get something like this ever
> merged in anyway, odds are you can fix up the driver faster...

That requires quite a bit of changes and commitment and again, there
are quite a bit of drivers that we can run into in the community,
we've just spotted 2 so far here for now.

>> > The cxgb4: driver is an example where although I did propose patches
>> > to use asynch firmware loading the entire registration of the
>> > netdevice would need to be changed as well in order to get this right.
>> > In short we have to scramble now to first identify drivers that have
>> > long init sequences and then fix. There's an assumption that we can
>> > easily fix drivers, this can take time. This series, although does
>> > take advantage of a kernel interface for other uses, lets us identify
>> > these drivers on the kernel ring buffer, so we can go and address
>> > fixing them with time.
>>
>> Another thing that came up during asynch firmware review / integration
>> on cxgb4 was that it would not be the only thing that would need to be
>> fixed, the driver also has a ton of ports and at least as per my
>> review the proper fix would be to use platform regiister stuff. It is
>> a major rewrite of the driver but an example of a driver that needs
>> quite a bit of work to meet this new 30 second driver requirement.
>
> It shouldn't be using any platform driver stuff, it's a pci device, not
> a platform device.

The general PCI stuff is already used, the reason for suggesting the
platform_device_register_simple() stuff is it has tons of ports and
each port will register in turn a new struct netdevice, essentially
one device can end up having tons of different network devices, the
platform stuff would be to allow handling each netdevice candidate
separately as part of the internal driver architecture, right now its
some scary loop thing that in my eyes can be very error prone.
drivers/net/ethernet/8390/ne.c. This discussion:

https://lkml.org/lkml/2014/6/25/815

> Why not just put the initial "register the device" in a single-shot
> workqueue or thread or something like that so that modprobe returns
> instantly back with a success and all is fine?

That surely is possible but why not a general solution for such kludges?

> Again, trying to hack the "deferred init" logic for PCI drivers isn't
> ok, I'm not going to take that into the driver core if at all possible,
> sorry.

No need to apologize I'm looking for the best solution here after all.
One userspace kludge is surely better than a one per driver while
drivers are fixed for this new driver requirement. Its just kind of
odd the circle of events for a kernel issue from kernel --> systemd
--> modprobe as a work around.

Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/