Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM

From: Luis R. Rodriguez
Date: Mon Sep 08 2014 - 21:04:50 EST


On Fri, Sep 5, 2014 at 3:40 PM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello, Luis.
>
> On Fri, Sep 05, 2014 at 11:12:17AM -0700, Luis R. Rodriguez wrote:
>> Meanwhile we are allowing a major design consideration such as a 30
>> second timeout for both init + probe all of a sudden become a hard
>> requirement for device drivers. I see your point but can't also be
>> introducing major design changes willy nilly either. We *need* a
>> solution for the affected drivers.
>
> Yes, make the behavior specifically specified from userland. When did
> I ever say that there should be no solution for the problem? I've
> been saying that the behavior should be selected from userland from
> the get-go, haven't I?
>
> I have no idea how the selection should be. It could be per-insmod or
> maybe just a system-wide flag with explicit exceptions marked on
> drivers is good enough. I don't know.

Its perfectly understandable if we don't know what path to take yet
and its also understandable for it to take time to figure out --
meanwhile though systemd already has merged a policy of a 30 second
timeout for *all drivers* though so we therefore need:

0) a solutions for affected combination of systemd / drivers
1) an agreed path forward

If we want a tight integration between both kernel / init system we
need to be able to communicate effectively folks and I'm afraid this
isn't happening. I last noted on systemd-devel how the 30 second
timeout issue was merged under incorrect assumptions -- that it was
not just init that at times caused delays, and that since we currently
batch both init and probe on the driver core we need a non fatal
userspace solution [0], while we work on design on the kernel side of
things for async'ing for drivers that make sense. A proper kernel
solution may take longer than expected, we can't just assume a
probe_async flag will suffice on drivers, in fact as Tejun notes, its
wrong since historically we have had some random userland depend on
the synhronous behaviour of module loading of some drivers, and that
*could* have taken a while.

Kay, Lennart, any recommendations ?

[0] http://lists.freedesktop.org/archives/systemd-devel/2014-August/022696.html

>> Also what stops drivers from going ahead and just implementing their
>> own async probe? Would that now be frowned upon as it strives away
>
> The drivers can't. How many times should I explain the same thing
> over and over again. libata can't simply make probing asynchronous
> w.r.t. module loading no matter how it does it. Yeah, sure, there can
> be other drivers which can do that without most people noticing it but
> a storage driver isn't one of them and the storage drivers are the
> problematic ones already, right?

Its one of the subsystems that has suffered from this, but not the only one.

>> from the original design? The bool would let those drivers do this
>> easily, and we would still need to identify these drivers, although
>> this particular change can be NAK'd Oleg's suggestion on
>> WARN_ON(fatal_signal_pending() at the end of load_module() seems to me
>> at least needed. And if its not async probe... what do those with
>> failed drivers do?
>
> I'm getting tired of explaining the same thing over and over again.
> The said change was nacked because the whole approach of "let's see
> which drivers get reported on the issue which exists basically for all
> drivers and just change the behavior of them" is braindead. It makes
> no sense whatsoever. It doesn't address the root cause of the problem
> while making the same class of drivers behave significantly
> differently for no good reason. Please stop chasing your own tail and
> try to understand the larger picture.

Understood.

Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/