Re: [RFCv3 PATCH 36/48] sched: Count number of shallower idle-states in struct sched_group_energy

From: Morten Rasmussen
Date: Tue Mar 24 2015 - 13:13:08 EST


On Tue, Mar 24, 2015 at 01:14:39PM +0000, Peter Zijlstra wrote:
> On Wed, Feb 04, 2015 at 06:31:13PM +0000, Morten Rasmussen wrote:
> > cpuidle associates all idle-states with each cpu while the energy model
> > associates them with the sched_group covering the cpus coordinating
> > entry to the idle-state. To get idle-state power consumption it is
> > therefore necessary to translate from cpuidle idle-state index to energy
> > model index. For this purpose it is helpful to know how many idle-states
> > that are listed in lower level sched_groups (in struct
> > sched_group_energy).
>
> I think this could use some text to describe how that number is useful.
>
> I suspect is has something to do with bigger domains having more idle
> modes (package C states etc..).

Close :)

You are not the first to be confused about the idle state representation
and numbering. Maybe I should just change it.

If we take typical ARM idle-states as an example, we have both per-cpu
and per-cluster idle-states. Unlike x86 (IIUC), cluster states are
controlled by cpuidle. All states are represented in the cpuidle state
table for each cpu regardless of whether it is a per-cpu or per-cluster
state. For the energy model we have organized them by attaching the
states to the cpumask representing the cpus that need to coordination to
enter the state as this is rather important to know to estimate energy
consumption.

Idle-state cpuidle Energy model table indices
index per-cpu sg per-cluster sg
WFI 0 0
Core power-down 1 1
Cluster power-down 2 0

Cluster power-down is the first (and only in this example) per-cluster
idle-state and in is therefore put in the idle-state table for the
sched_group spanning the whole cluster. Since it is first it has index
0. However, the same state has index 2 in cpuidle as it only has a table
per cpu. To do an easy translation from cpuidle index to energy model
idle-state table index it is therefore quite useful to know how many
states are in the tables of of the energy model attached to groups a
lower levels. Basically, energy_model_idx = cpuidle_idx - state_below,
which is 2 - 2 = 0 for cluster power-down.

An alternative that could avoid this translation is to have a full table
at each level (3 entries for this example) and insert dummy values on
indices not applicable to the group the table is attached to. For
example insert '0' on index=2 for the per-cpu sg energy model data.

We can't avoid index translation entirely though. We need to know the
cluster power consumption when all cpus are in state 0 or 1, but the
cluster is still up in an idle but yet active state to estimate energy
consumption. The energy model therefore has an additional 'active idle'
idle state for the cluster which sits before the first true idle-state
in the energy model idle-state table. In the example above, active idle
would be per-cluster sg energy model table index 0 and cluster
power-down index 1.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/