Re: [RFC PATCH v2 0/2] Saving power by cpu evacuationsched_max_capacity_pct=n

From: Vaidyanathan Srinivasan
Date: Fri May 22 2009 - 05:15:35 EST


* Pavel Machek <pavel@xxxxxx> [2009-05-19 22:40:15]:

> On Wed 2009-05-13 17:01:00, Andi Kleen wrote:
> > > >From what I've been told its popular to over-commit the cooling capacity
> > > in a rack, so that a number of servers can run at full thermal capacity
> > > but not all.
> >
> > Yes. But in this case you don't want to use throttling, you want
> > to use p-states which actually safe power unlike throttling.
> >
> > > I've also been told that hardware sucks at throttling,
> >
> > Throttling is not really something you should use in normal
> > operation, it's just a emergency measure. For that it works
> > quite well, but you really don't want it in normal operation.
> >
> > > therefore people
> > > want to fix the OS so as to limit the thermal capacity and avoid the
> > > hardware throttle from kicking in, whilst still not exceeding the rack
> > > capacity or similar nonsense.
> >
> > Yes that's fine and common, but you actually need to save power for this,
> > which throttling doesn't do.
>
> Actually throttling will lower power consumption at any given moment
> (not power consumption for any given task!) and will keep your rack
> from melting.

Yes, we want to reduce overall power consumption.

> But I don't see why it is neccessary to evacuate cores for this. Why
> not just schedule special task that enters C3 instead of computing?

This is what essentially happens in the load balancer approach. Not
scheduling on a particular core will run the scheduler's idle task
that will transition the core to lowest power state. Pinning a user
space task and using special driver to hold the core in C3 state will
break scheduling fairness. At this point the application decides when
to give the core back to scheduler.

> That was what I planned to do on athlon 900 (1 core) with broken
> fan...
>
> For what you are doing, cpu hotplug seems more suitable. Can you
> enhance it so that it is fast enough for you?

Yes cpu hotplug framework can be used. That is definitely an
alternative to this approach. However in the case of cpuhotplug, the
evacuation is directed to a particular core which may affect user
space affinity and cpusets. But in this case we can limit the overall
system capacity, like run at most 7 cores at a time in an 8 core
system, but we actually don't need to care which particular core is
'forced to idle' at any given point in time.

Further discussion regarding this can be found in the following
thread: http://lkml.org/lkml/2009/5/19/54

--Vaidy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/