Re: [patch 0/2] RFC sched: Change nohz ilb logic from poll to pushmodel

From: Pallipadi, Venkatesh
Date: Thu Jun 18 2009 - 19:43:05 EST


On Wed, 2009-06-17 at 12:16 -0700, Vaidyanathan Srinivasan wrote:
> * venkatesh.pallipadi@xxxxxxxxx <venkatesh.pallipadi@xxxxxxxxx> [2009-06-17 11:26:49]:
>
> > Existing nohz idle load balance (ilb) logic uses the pull model, with one
> > idle load balancer CPU nominated on any partially idle system and that
> > balancer CPU not going into nohz mode. With the periodic tick, the
> > balancer does the idle balancing on behalf of all the CPUs in nohz mode.
> >
> > This is not very optimal and has few issues:
> > * the balancer will continue to have periodic ticks and wakeup
> > frequently (HZ rate), even though it may not have any rebalancing to do on
> > behalf of any of the idle CPUs.
> > * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this periodic
> > wakeup can result in an additional interrupt on a CPU doing the timer
> > broadcast.
> > * The balancer may end up spending a lot of time doing the balancing on
> > behalf of nohz CPUs, especially with increasing number of sockets and
> > cores in the platform.
> >
> > The alternative is to have a push model, where all idle CPUs can enter nohz
> > mode and busy CPU kicks one of the idle CPUs to take care of idle balancing
> > on behalf of a group of idle CPUs.
>
> Hi Venki,
>
> The idea is very useful and further extends the power savings in idle
> system. However the kick method from busy CPU should not add to
> scheduling latency during a sudden burst of work.
>
> Does adding nohz_balancer_kick() in trigger_load_balance() path in
> a busy CPU add to its overhead?
>
>
> > Following patches tries that approach. There are still some rough edges
> > in the patches related to use of #defines around the code. But, wanted
> > to get opinion on this approach as an RFC (not for inclusion into the
> > tree yet).
>
> I like the idea but my only concern is the performance impact on busy
> cpus with this push model.

Vaidy,

I tried to keep the overhead on the busy CPU low in this RFC. There is a
check the for next_balance time and if there is a load balance CPU
nominated we just send a resched to the load balance CPU. We do look at
cpu_mask to find the first bit set, when there is no assigned
load_balance CPU (that is when say load balance CPU started running and
no other CPU has nominated himself yet). But, that's the only overhead
there. All the other complexities are handled on the idle CPU side.

Thanks,
Venki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/