Re: Crashes with 874bbfe600a6 in 3.18.25

From: Mike Galbraith
Date: Fri Feb 05 2016 - 15:47:22 EST


On Fri, 2016-02-05 at 11:49 -0500, Tejun Heo wrote:
> Hello, Mike.
>
> On Thu, Feb 04, 2016 at 03:00:17AM +0100, Mike Galbraith wrote:
> > Isn't it the case that, currently at least, each and every spot that
> > requires execution on a specific CPU yet does not take active measures
> > to deal with hotplug events is in fact buggy? The timer code clearly
> > states that the user is responsible, and so do both workqueue.[ch].
>
> Yeah, the usages which require affinity for correctness must flush the
> work items from a cpu down callback.

Good, we agree. Now bear with me a moment..

That very point is what makes it wrong for the workqueue code to ever
target a work item. The instant it does target selection, correctness
may be at stake, it doesn't know, thus it must assume the full onus,
which it has neither the knowledge not the time to do. That's how we
exploded on node = -1, trying to help out the user by doing his job,
but then not doing the whole job. IMHO, a better plan is to let the
user screw it up all by himself.

-Mike