Re: frequent lockups in 3.18rc4

From: Linus Torvalds
Date: Fri Dec 12 2014 - 15:20:11 EST


On Fri, Dec 12, 2014 at 11:58 AM, David Lang <david@xxxxxxx> wrote:
>
> If the machine has NOHZ and has a cpu bound userspace task, it could take
> quite a while before userspace would trigger a reschedule (at least if I've
> understood the comments on this thread properly)

The thing is, we'd have to return to user space for that to happen.
And when we do that, we check the "should we schedule" flag again. So
races like this really shouldn't matter, but there could be something
kind-of-similar that just ends up causing a wakeup to be delayed.

But it would need to be delayed for seconds (for the RCU threads) or
for tens of seconds (for the watchdog) to matter.

Which just seems unlikely. Even the "very high load" thing shouldn't
really matter, since while that could delay one particular thread
being scheduled, it shouldn't delay the next "should we schedule"
test. In fact, high load would normally be extected to make the next
"should we schedule" come faster.

But this is where some load calculation overflow might screw things
up, of course.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/