Re: Crashes with 874bbfe600a6 in 3.18.25
From: Tejun Heo
Date: Wed Feb 03 2016 - 14:02:08 EST
Hello, Thomas.
On Wed, Feb 03, 2016 at 07:46:11PM +0100, Thomas Gleixner wrote:
> > > So I think 874bbfe600a6 is really bogus. It should be reverted. We
> > > already have a proper fix for vmstat 176bed1de5bf ("vmstat: explicitly
> > > schedule per-cpu work on the CPU we need it to run on"). This which
> > > should be used for the stable trees as a replacement.
> >
> > It's not bogus. We can't flip a property that has been guaranteed
> > without any provision for verification. Why do you think vmstat blow
> > up in the first place? vmstat would be the canary case as it runs
> > frequently on all systems. It's exactly the sign that we can't break
> > this guarantee willy-nilly.
>
> You're in complete failure denial mode once again.
Well, you're in an unnecessary escalation mode as usual. Was the
attitude really necessary? Chill out and read the thread again.
Michal is saying the dwork->cpu assignment was bogus and I was
refuting that.
> Fact is:
>
> That patch breaks stuff because there is no stable cpu -> node mapping
> accross cpu on/offlining. As a result this selects unbound_pwq_by_node() on
> node -1.
>
> The reason why you need to do that work->cpu assignment might be legitimate,
> but that does not justify that you expose systems to a lurking out of bounds
> access which results in a NULL pointer dereference.
>
> As long as cpu_to_node(cpu) can return -1, we need a sanity check there. And
> we need that now and not at some point in the future when the patches
> establishing a stable cpu -> node mapping are finished.
>
> Stop arguing around a bug which really exists and was exposed by this patch.
Michal brought it up here but there's a different thread where Mike
reported NUMA_NO_NODE issue and I already posted the fix.
http://lkml.kernel.org/g/20160203185425.GK14091@xxxxxxxxxxxxxxx
Thanks.
--
tejun