Re: [PATCH] random: Fix kernel panic due to system_wq use before init

From: Waiman Long
Date: Wed Sep 14 2016 - 18:16:00 EST


On 09/14/2016 05:06 PM, Linus Torvalds wrote:
On Wed, Sep 14, 2016 at 12:34 PM, Waiman Long<waiman.long@xxxxxxx> wrote:
I can try, but the 16-socket system that I have at the moment takes a long
time (more than an hour) for one shutdown-reboot cycle. It may not be really
more interrupts in 4.8, it may be that the random driver just somehow run
very slow on my test machine as it seems to have a major rewrite in the 4.8
cycle.
Looking at the random driver updates since 4.7, the only thing I see
is that .crng_fast_load() for the chacha20 randomness. And that should
trigger only until it's been initialized, so the cost looks like it
should be limited.

Is there some fundamental reason you think it's the random driver?
Other than the oops? Because I'd be more inclined to suspect just some
apic issue or something, where an actual interrupt line ends up
screaming or whatever. Is this UV? There's also the CPU hotplug state
machine changes etc.

Yes, it is because of the oops that I suspect the random driver may be the cause.


But a few rounds of bisecting should hopefully cut down on the
suspects a lot. A *full* bisect might be 16-17 rounds, but if you can
do just four or five rounds of bisection, that should still cut it
down from 14k commits to "only" several hundred..

Linus

Yes, I will do a few rounds to see if we can isolate the problem. In the mean time, I will also reconfigure the system with less sockets to see if it is reproduced in a smaller configuration.

Cheers,
Longman