Re: [RFC] (How to) Let idle CPUs sleep

From: Nick Piggin
Date: Mon May 09 2005 - 01:29:46 EST


Srivatsa Vaddagiri wrote:
On Sun, May 08, 2005 at 02:14:23PM +1000, Nick Piggin wrote:

Yeah probably something around that order of magnitude. I suspect
there will fast be a point where either you'll get other timers
going off more frequently, and / or you simply get very quickly
diminishing returns on the amount of power saving gained from
increasing the period.


I am looking at it from the other perspective also i.e, virtualized
env. Any amount of unnecessary timer ticks will lead to equivalent amount
of unnecessary context switches among the guest OSes.


Yep.


It is not so much a matter of "fixing" the scheduler as just adding
more heuristics. When are we too busy? When should we wake another
CPU? What if that CPU is an SMT sibling? What if it is across the
other side of the topology, and other CPUs closer to it are busy
as well? What if they're busy but not as busy as we are? etc.

We've already got that covered in the existing periodic pull balancing,
so instead of duplicating this logic and moving this extra work to busy
CPUs, we can just use the existing framework.


I don't think we have to duplicate the logic, just "reuse" whatever logic
exists (in find_busiest_group etc). However I do agree there is movement

OK, that may possibly be an option... however:

of extra work to busy CPUs, but that is only to help the idle CPU sleep longer.
Whether it justifies the additional complexity or not is what this RFC is
about I guess!


Yeah, this is a bit worrying. In general we should not be loading
up busy CPUs with any more work, and sleeping idle CPUs should be
done as a blunt "slowpath" operation. Ie. something that works well
enough.

FWIW, I have also made some modifications in the original proposal for reducing the watchdog workload (instead of the same non-idle cpu waking up all the sleeping CPUs it finds in the same rebalance_tick, the task
is spread over multiple non-idle tasks in different rebalance_ticks).
New (lightly tested) patch is in the mail below.


Mmyeah, I'm not a big fan :)

I could probably find some time to do my implementation if you have
a complete working patch for eg. UML.



At least we should try method A first, and if that isn't good enough
(though I suspect it will be), then think about adding more complexity
to the scheduler.


What would be good to measure between the two approaches is the CPU utilization (over a period of time - say 10 hrs) of somewhat lightly loaded SMP guest OSes (i.e some CPUs are idle and other CPUs of the same guest are not idle), when multiple such guest OSes are running simultaneously on the same box. This means I need a port of VST to UML :(


Yeah that would be good.

--
SUSE Labs, Novell Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/