Re: [RFCv3 PATCH 30/48] sched: Calculate energy consumption of sched_group

From: Peter Zijlstra
Date: Mon Mar 23 2015 - 12:47:20 EST


On Mon, Mar 16, 2015 at 02:15:46PM +0000, Morten Rasmussen wrote:
> You are absolutely right. The current code is broken for system
> topologies where all cpus share the same clock source. To be honest, it
> is actually worse than that and you already pointed out the reason. We
> don't have a way of representing top level contributions to power
> consumption in RFCv3, as we don't have sched_group spanning all cpus in
> single cluster system. For example, we can't represent L2 cache and
> interconnect power consumption on such systems.
>
> In RFCv2 we had a system wide sched_group dangling by itself for that
> purpose. We chose to remove that in this rewrite as it led to messy
> code. In my opinion, a more elegant solution is to introduce an
> additional sched_domain above the current top level which has a single
> sched_group spanning all cpus in the system. That should fix the
> SD_SHARE_CAP_STATES problem and allow us to attach power data for the
> top level.

Maybe remind us why this needs to be tied to sched_groups ? Why can't we
attach the energy information to the domains?

There is an additional problem with groups you've not yet discovered and
that is overlapping groups. Certain NUMA topologies result in this.
There the sum of cpus over the groups is greater than the total cpus in
the domain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/