Re: Crashes with 874bbfe600a6 in 3.18.25

From: Tejun Heo
Date: Fri Feb 05 2016 - 11:49:30 EST


Hello, Mike.

On Thu, Feb 04, 2016 at 03:00:17AM +0100, Mike Galbraith wrote:
> Isn't it the case that, currently at least, each and every spot that
> requires execution on a specific CPU yet does not take active measures
> to deal with hotplug events is in fact buggy? The timer code clearly
> states that the user is responsible, and so do both workqueue.[ch].

Yeah, the usages which require affinity for correctness must flush the
work items from a cpu down callback.

> I was surprised me to hear that some think they have an iron clad
> guarantee, given the null and void clause is prominently displayed.

Nobody is (or at least should be) expecting workqueue to handle
affinity across CPU offlining events. That is not the problem. The
problem is that currently queue_work(work) and
queue_work_on(smp_processor_id(), work) are identical and there likely
are affinity-for-correctness users which are doing the former.

Thanks.

--
tejun