Re: [RFC PATCH 0/2] sched: simplify the select_task_rq_fair()

From: Mike Galbraith
Date: Thu Jan 24 2013 - 02:47:14 EST


On Thu, 2013-01-24 at 15:15 +0800, Michael Wang wrote:
> On 01/24/2013 02:51 PM, Mike Galbraith wrote:
> > On Thu, 2013-01-24 at 14:01 +0800, Michael Wang wrote:
> >
> >> I've enabled WAKE flag on my box like you did, but still can't see
> >> regression, and I've just tested on a power server with 64 cpu, also
> >> failed to reproduce the issue (not compared with virgin yet, but can't
> >> see collapse).
> >
> > I'm not surprised. I'm seeing enough inconsistent crap to come to the
> > conclusion that stock scheduler knobs flat can't be used on a largish
> > box, they're just too preempt-happy, leading to weird crap.
> >
> > My 2 missing nodes came back, and the very same kernel that highly
> > repeatably collapsed with 2 nodes does not with 4 nodes, and 2 nodes
> > does not collapse with only preemption knob tweaking, and that's
> > bullshit. Virgin shows instability in the mid-range, make a tiny tweak
> > that should have little if any effect there, and that instability
> > vanishes entirely. Test runs are not consistent enough boot to boot etc
> > etc. Either stock knobs suck on NUMA boxen, or this box is possessed.
>
> Mike, I wonder the reason why change back to the old way make collapse
> away may not because there are logical error in new balance path, it's
> just changed the cost of select_task_rq(), whatever it's more or less,
> it's accidentally achieve the same effect as you tweak the knob, so
> that's the reason why it looks like old is better than new.

That's what I'm saying, it's a useless crap side-effect of a preempt
happy kernel. Results with these knobs are just not stable. Results go
wildly unstable with 2 nodes vs 4 in this box, but can be stabilized in
all with preemption knob adjustment.. or phase of moon might make them
appear stable.. or not.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/