On 20 March 2014 18:18, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
On 20/03/14 17:02, Vincent Guittot wrote:
On 20 March 2014 13:41, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
On 19/03/14 16:22, Vincent Guittot wrote:
We replace the old way to configure the scheduler topology with a new method
which enables a platform to declare additionnal level (if needed).
We still have a default topology table definition that can be used by platform
that don't want more level than the SMT, MC, CPU and NUMA ones. This table can
be overwritten by an arch which either wants to add new level where a load balance
make sense like BOOK or powergating level or wants to change the flags
configuration of some levels.
For each level, we need a function pointer that returns cpumask for each cpu,
a function pointer that returns the flags for the level and a name. Only flags
that describe topology, can be set by an architecture. The current topology
flags are:
SD_SHARE_CPUPOWER
SD_SHARE_PKG_RESOURCES
SD_NUMA
SD_ASYM_PACKING
Then, each level must be a subset on the next one. The build sequence of the
sched_domain will take care of removing useless levels like those with 1 CPU
and those with the same CPU span and relevant information for load balancing
than its child.
The paragraph above contains important information to set this up
correctly, that's why it might be worth clarifying:
- "next one" of sd means "child of sd" ?
It's the next one in the table so the parent in the sched_domain
Right, it's this way around. DIE is parent of MC is parent of GMC. Maybe
you could be more explicit about the parent of relation here?
- "subset" means really "subset" and not "proper subset" ?
yes, it's really "subset" and not "proper subset"
Vincent
On TC2 w/ the following change in cpu_corepower_mask()
const struct cpumask *cpu_corepower_mask(int cpu)
{
- return &cpu_topology[cpu].thread_sibling;
+ return cpu_topology[cpu].socket_id ?
&cpu_topology[cpu].thread_sibling :
+ &cpu_topology[cpu].core_sibling;
}
I get this e.g. for CPU0,2:
CPU0: cpu_corepower_mask=0-1 -> GMC is subset of MC
CPU0: cpu_coregroup_mask=0-1
CPU0: cpu_cpu_mask=0-4
CPU2: cpu_corepower_mask=2 -> GMC is proper sunset of MC
CPU2: cpu_coregroup_mask=2-4
CPU2: cpu_cpu_mask=0-4
I assume here that this is a correct set-up.
So this is a correct setup?
yes it's a correct setup before the degenerate sequence
The domain degenerate part:
"useless levels like those with 1 CPU" ... that's the case for GMC level
for CPU2,3,4
The GMC level is destroyed because of the following code snippet in
sd_degenerate(): if (cpumask_weight(sched_domain_span(sd)) == 1)
so that's fine.
In case of CPU0,1 since GMC and MC have the same span, the code in
build_sched_groups() creates only one group for MC and that's why
pflags is altered in sd_parent_degenerate() to SD_WAKE_AFFINE (0x20) and
the if condition 'if (~cflags & pflags)' is not hit and
sd_parent_degenerate() finally returns 1 for MC.
So the "those with the same CPU span and relevant information for load
balancing than its child." is not so easy to understand for me. Because
both levels have the same span we actually don't take the flags of the
parent into consideration which require at least 2 groups.
It's only the case if the parent has got 1 group otherwise we take
care of all flags
So the TC2 example covers for me two corner cases: (1) The level I want
to get rid of only contains 1 CPU (GMC for CPU2,3,4) and (2) The span of
the parent level I want to get rid of (MC for CPU0,1) of is the same as
the span of the level which should stay.
Having the same span is not enough. There must also no have relevant
differences in the flags (after removing flags that need more than 1
group is the parent has only 1 groups)
Are these two corner cases the only one supported here? If yes this has
to be stated somewhere, otherwise if somebody will try this approach on
a different topology, (s)he might be surprised.
The degenerate sequence is there to remove useless level but it will
not remove useful level. This rework has not modify the behavior of
the degenerate sequence so (s)he should take the same care than
previously.
Vincent
Could you please comment on the paragraph above too?
Thanks,
-- Dietmar
If we only consider SD_SHARE_POWERDOMAIN for the socket related level,
this works fine.
I would like to test this on more platforms but I only have my TC2
available :-)
-- Dietmar
[...]