[PATCH] sched/fair: Fix decay to zero period in decay_load()

From: Morten Rasmussen
Date: Mon Apr 11 2016 - 10:41:37 EST


In __compute_runnable_contrib() we are happy with returning LOAD_AVG_MAX
when the update period n >= LOAD_AVG_MAX_N (=345), so we should be happy
with returning zero for n >= LOAD_AVG_MAX_N when decaying in
decay_load() as well instead of only returning zero for n >
LOAD_AVG_PERIOD * 63 (=2016).

As both conditions are unlikely() the impact is quite likely very small,
but at least it makes the rounding errors for load accumulation and
decay symmetrical.

Signed-off-by: Morten Rasmussen <morten.rasmussen@xxxxxxx>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 56b7d4b..42515b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2527,7 +2527,7 @@ static __always_inline u64 decay_load(u64 val, u64 n)

if (!n)
return val;
- else if (unlikely(n > LOAD_AVG_PERIOD * 63))
+ else if (unlikely(n > LOAD_AVG_MAX_N))
return 0;

/* after bounds checking we can collapse to 32-bit */
--
1.9.1


> the task's sched avgs MAY NOT be decayed to 0, depending on how
> big its sums are, and the chance to 0 is still good as load_sum
> is way less than ~0ULL and util_sum way less than ~0U.

I don't get the last bit about load_sum < ~0ULL and util_sum < ~0U.
Whether you get to zero depends on the sums (as you say) and the actual
value of 'n'. It is true that you might get to zero even if n <
LOAD_AVG_MAX_N if the sums are small.

> Nevertheless, what really maters is what happens in the worst-case
> scenario, which is when (u32)m = 0? So in that case, it would be like
> after so long a sleep, we treat the task as it never slept, and has the
> same sched averages as before it slept. That is actually not bad or
> nothing to worry about, and think of it as the same as how we treat
> a new born task.

There is subtle but important difference between not decaying a task
correctly and adding new task: The sleeping task is already accounted
for in the cfs_rq.avg.{load,util}_sum. The sleeping task's contribution
to cfs_rq.avg has been decayed correctly in the mean time which means
that by not guaranteeing a decay of the se.avg at wake-up you introduce
a discrepancy between the task's owen view of its contribution (se.avg)
and the cfs_rq view (cfs_rq.avg). That might lead to trouble if the task
is migrated at wake-up as we remove se.avg amount of contribution from
the previous cpu's cfs_rq.avg. In other words, you remove too much from
the cfs_rq.avg.

The discrepancy will decay and disappear after LOAD_AVG_MAX_N ms, which
might be acceptable, but it is not a totally harmless change IMHO.