Re: [PATCH 1/1] sched/fair: Fix unfairness caused by missing load decay

From: Dietmar Eggemann
Date: Wed May 05 2021 - 05:43:41 EST


On 01/05/2021 16:41, Odin Ugedal wrote:
> Hi,
>
>> I think what I see on my Juno running the unfairness_missing_load_decay.sh script is
>> in sync which what you discussed here:
>
> Thanks for taking a look!
>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 794c2cb945f8..7214e6e89820 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -10854,6 +10854,8 @@ static void propagate_entity_cfs_rq(struct sched_entity *se)
>> break;
>>
>> update_load_avg(cfs_rq, se, UPDATE_TG);
>> + if (!cfs_rq_is_decayed(cfs_rq))
>> + list_add_leaf_cfs_rq(cfs_rq);
>> }
>> }
>
> This might however lead to "loss" at /slice/cg-2/sub and
> slice/cg-1/sub in this particular case tho, since
> propagate_entity_cfs_rq skips one cfs_rq
> by taking the parent of the provided se. The loss in that case would
> however not be equally big, but will still often contribute to some
> unfairness.

Yeah, that's true.

By moving stopped `stress` tasks into

/sys/fs/cgroup/cpu/slice/cg-{1,2}/sub

and then into

/sys/fs/cgroup/cpuset/A

which has a cpuset.cpus {0-1,4-5} not containing the cpus the `stress`
tasks attached {2,3} to and then restart the `stress` tasks again I get:

cfs_rq[1]:/slice/cg-1/sub
.load_avg : 1024
.removed.load_avg : 0
.tg_load_avg_contrib : 1024 <---
.tg_load_avg : 2047 <---
.se->avg.load_avg : 511
cfs_rq[1]:/slice/cg-1
.load_avg : 512
.removed.load_avg : 0
.tg_load_avg_contrib : 512 <---
.tg_load_avg : 1022 <---
.se->avg.load_avg : 512
cfs_rq[1]:/slice
.load_avg : 513
.removed.load_avg : 0
.tg_load_avg_contrib : 513
.tg_load_avg : 1024
.se->avg.load_avg : 512
cfs_rq[5]:/slice/cg-1/sub
.load_avg : 1024
.removed.load_avg : 0
.tg_load_avg_contrib : 1023 <---
.tg_load_avg : 2047 <---
.se->avg.load_avg : 511
cfs_rq[5]:/slice/cg-1
.load_avg : 512
.removed.load_avg : 0
.tg_load_avg_contrib : 510 <---
.tg_load_avg : 1022 <---
.se->avg.load_avg : 511
cfs_rq[5]:/slice
.load_avg : 512
.removed.load_avg : 0
.tg_load_avg_contrib : 511
.tg_load_avg : 1024
.se->avg.load_avg : 510

I saw that your v2 patch takes care of that.