Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
From: Luis R. Rodriguez
Date: Fri Sep 05 2014 - 03:47:57 EST
On Fri, Sep 5, 2014 at 12:19 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> On Thu, Sep 04, 2014 at 11:37:24PM -0700, Luis R. Rodriguez wrote:
>> + /*
>> + * I got SIGKILL, but wait for 60 more seconds for completion
>> + * unless chosen by the OOM killer. This delay is there as a
>> + * workaround for boot failure caused by SIGKILL upon device
>> + * driver initialization timeout.
>> + *
>> + * N.B. this will actually let the thread complete regularly,
>> + * wait_for_completion() will be used eventually, the 60 second
>> + * try here is just to check for the OOM over that time.
>> + */
>> + WARN_ONCE(!test_thread_flag(TIF_MEMDIE),
>> + "Got SIGKILL but not from OOM, if this issue is on probe use .driver.async_probe\n");
>> + for (i = 0; i < 60 && !test_thread_flag(TIF_MEMDIE); i++)
>> + if (wait_for_completion_timeout(&done, HZ))
>> + goto wait_done;
> Ugh... Jesus, this is way too hacky, so now we fail on 90s timeout
> instead of 30?
Nope! I fell into the same trap and only with tons of patience by part
of Tetsuo with me was I able to grok that the 60 seconds here are not
for increasing the timeout, this is just time spent checking to ensure
that the OOM wasn't the one who triggered the SIGKILL. Even if the
drivers took eons it should be fine now, I tried it :D
> Why do we even need this with the proposed async
> probing changes?
Ah -- well without it the way we "find" drivers that need this new
"async feature" is by a bug report and folks saying their system can't
boot, or they say their device doesn't come up. That's all. Tracing
this to systemd and a timeout was one of the most ugliest things ever.
There two insane bug reports you can go check:
mptsas was the first:
I only had Cc'd you on the newest gem pata_marvell :
We can't seriously expect to be doing all this work for every driver.
a WARN_ONCE() would enable us to find the drivers that need this new
async probe "feature".
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/