Re: Crashes with 874bbfe600a6 in 3.18.25

From: Mike Galbraith
Date: Wed Feb 03 2016 - 21:00:27 EST


On Wed, 2016-02-03 at 12:06 -0500, Tejun Heo wrote:
> On Wed, Feb 03, 2016 at 06:01:53PM +0100, Mike Galbraith wrote:
> > Hm, so it's ok to queue work to an offline CPU? What happens if it
> > doesn't come back for an eternity or two?
>
> Right now, it just loses affinity. A more interesting case is a cpu
> going offline whlie work items bound to the cpu are still running and
> the root problem is that we've never distinguished between affinity
> for correctness and optimization and thus can't flush or warn on the
> stagglers. The plan is to ensure that all correctness users specify
> the CPU explicitly. Once we're there, we can warn on illegal usages.

Isn't it the case that, currently at least, each and every spot that
requires execution on a specific CPU yet does not take active measures
to deal with hotplug events is in fact buggy? The timer code clearly
states that the user is responsible, and so do both workqueue.[ch].

I was surprised me to hear that some think they have an iron clad
guarantee, given the null and void clause is prominently displayed.

-Mike