Re: [RFC PATCH] sched: Pass affine target cpu into wake_affine

From: Lin Ming
Date: Mon Jan 04 2010 - 06:30:55 EST


On Mon, 2010-01-04 at 18:59 +0800, Mike Galbraith wrote:
> On Mon, 2010-01-04 at 10:32 +0100, Peter Zijlstra wrote:
> > On Mon, 2010-01-04 at 17:12 +0800, Lin Ming wrote:
> > > On Mon, 2010-01-04 at 17:25 +0800, Peter Zijlstra wrote:
> > > > On Mon, 2010-01-04 at 17:03 +0800, Lin Ming wrote:
> > > > > commit a03ecf08d7bbdd979d81163ea13d194fe21ad339
> > > > > Author: Lin Ming <ming.m.lin@xxxxxxxxx>
> > > > > Date: Mon Jan 4 14:14:50 2010 +0800
> > > > >
> > > > > sched: Pass affine target cpu into wake_affine
> > > > >
> > > > > Since commit a1f84a3(sched: Check for an idle shared cache in select_task_rq_fair()),
> > > > > the affine target maybe adjusted to any idle cpu in cache sharing domains
> > > > > instead of current cpu.
> > > > > But wake_affine still use current cpu to calculate load which is wrong.
> > > > >
> > > > > This patch passes affine cpu into wake_affine.
> > > > >
> > > >
> > > > Does this at all help with that regression?
> > >
> > > No.
> >
> > crap :/
> >
> > The change does look sensible though.
>
> I piddled with all kinds of ways to get around calling wake_affine()
> entirely, and/or calling it with the affine candidate to no avail. Best
> result was always to do the silly looking thing, namely test the current
> cpu for wake affine decision, but slip in the shared cache cpu.
>
> I bet the below helps, though there will still be cache misses, so there
> will still be pain for extreme switchers. Question is whether the
> ramp-up gain is worth it. I think yes, since it's up to 100%. Would be
> most excellent to find a way to know in advance when the cost will be
> too high, and then not go there. Same applies for doing the affinity
> decision every time for extreme switchers. It's expensive for those,
> especially so when they're pinned, but pays in the general case.
>
> Anyway...
>
> PREFER_SIBLING is set at the CPU domain level if you don't have power
> saving set, so you get to eat cache misses for each cpu, whether it's
> sharing a cache or not as you traverse. Lots of CPUs, LOTS of pain.
>
> not-signed-off

Nice.
I did a quick test and it does fix the regression.
And actually, it achieves +30% improvement for the quick test with this
patch applied to 2.6.33-rc2, compared with 2.6.32.

I'll do more test to confirm the improvement.

Thanks,
Lin Ming

>
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 57e6357..5b81156 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -99,7 +99,7 @@ int arch_update_cpu_topology(void);
> | 1*SD_WAKE_AFFINE \
> | 1*SD_SHARE_CPUPOWER \
> | 0*SD_POWERSAVINGS_BALANCE \
> - | 0*SD_SHARE_PKG_RESOURCES \
> + | 1*SD_SHARE_PKG_RESOURCES \
> | 0*SD_SERIALIZE \
> | 0*SD_PREFER_SIBLING \
> , \
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 42ac3c9..8fe7ee8 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -1508,7 +1508,7 @@ static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flag
> * If there's an idle sibling in this domain, make that
> * the wake_affine target instead of the current cpu.
> */
> - if (tmp->flags & SD_PREFER_SIBLING)
> + if (tmp->flags & SD_SHARE_PKG_RESOURCES)
> target = select_idle_sibling(p, tmp, target);
>
> if (target >= 0) {
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/