Re: [RFCv3 PATCH 23/48] sched: Allocate and initialize energy data structures

From: Dietmar Eggemann
Date: Wed Apr 29 2015 - 11:43:30 EST


On 28/04/15 09:58, pang.xunlei@xxxxxxxxxx wrote:
> Morten Rasmussen <morten.rasmussen@xxxxxxx> wrote 2015-02-05 AM 02:31:00:
>> [RFCv3 PATCH 23/48] sched: Allocate and initialize energy data structures
>>
>> From: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
>>
>> The per sched group sched_group_energy structure plus the related
>> idle_state and capacity_state arrays are allocated like the other sched
>> domain (sd) hierarchy data structures. This includes the freeing of
>> sched_group_energy structures which are not used.
>>
>> One problem is that the number of elements of the idle_state and the
>> capacity_state arrays is not fixed and has to be retrieved in
>> __sdt_alloc() to allocate memory for the sched_group_energy structure and
>> the two arrays in one chunk. The array pointers (idle_states and
>> cap_states) are initialized here to point to the correct place inside the
>> memory chunk.
>>
>> The new function init_sched_energy() initializes the sched_group_energy
>> structure and the two arrays in case the sd topology level contains energy
>> information.

[...]

>>
>> +static void init_sched_energy(int cpu, struct sched_domain *sd,
>> + struct sched_domain_topology_level *tl)
>> +{
>> + struct sched_group *sg = sd->groups;
>> + struct sched_group_energy *energy = sg->sge;
>> + sched_domain_energy_f fn = tl->energy;
>> + struct cpumask *mask = sched_group_cpus(sg);
>> +
>> + if (!fn || !fn(cpu))
>> + return;
>
> Maybe if there's no valid fn(), we can dec the sched_group_energy's
> reference count, so that it can be freed if no one uses it.

Good catch! We actually want that sg->sge is NULL if there is no
energy function fn or fn returns NULL. We never noticed that this is
not the case since we have tested the whole patch-set only with energy
functions available for each sched domain (sd) so far.

All sd's lower or equal 'struct sched_domain * ea_sd' (highest level
at which energy model is provided) have to provide a valid energy
function fn. A check which is currently missing as well.

Instead of dec the ref count, I could defer the inc from get_group to
init_sched_energy.

>
> Also, this function may enter several times for the shared sge,
> there is no need to do the duplicate operation below. Adding
> this would be better?
>
> if (cpu != group_balance_cpu(sg))
> return;
>

That's true.

This snippet gives the functionality on top of this patch (Tested on
a two cluster ARM system w/ fn set to NULL on MC or DIE sd level or
both in arm_topology[] (arch/arm/kernel/topology.c)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c49f3ee928b8..6d9b5327a2b6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5969,7 +5969,6 @@ static int get_group(int cpu, struct sd_data *sdd, struct sched_group **sg)
(*sg)->sgc = *per_cpu_ptr(sdd->sgc, cpu);
atomic_set(&(*sg)->sgc->ref, 1); /* for claim_allocations */
(*sg)->sge = *per_cpu_ptr(sdd->sge, cpu);
- atomic_set(&(*sg)->sge->ref, 1); /* for claim_allocations */
}

return cpu;
@@ -6067,8 +6066,16 @@ static void init_sched_energy(int cpu, struct sched_domain *sd,
sched_domain_energy_f fn = tl->energy;
struct cpumask *mask = sched_group_cpus(sg);

- if (!fn || !fn(cpu))
+ if (cpu != group_balance_cpu(sg))
+ return;
+
+ if (!fn || !fn(cpu)) {
+ sg->sge = NULL;
return;
+ }
+
+ atomic_set(&sg->sge->ref, 1); /* for claim_allocations */
+

[...]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/