Re: Divide-by-zero in post_init_entity_util_avg

From: Peter Zijlstra
Date: Thu Jun 16 2016 - 08:25:37 EST


On Thu, Jun 16, 2016 at 10:50:40AM +0200, Peter Zijlstra wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f75930bdd326..3fd3d903e6b6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2878,6 +2878,20 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
> }
> }
>
> +/*
> + * Explicitly do a load-store to ensure the temporary value never hits memory.
> + * This allows lockless observations without ever seeing the negative values.
> + *
> + * Incidentally, this also generates much saner code for x86.
> + */
> +#define sub_positive(type, ptr, val) do { \
> + type tmp = READ_ONCE(*ptr); \
> + tmp -= (val); \
> + if (tmp < 0) \
> + tmp = 0; \
> + WRITE_ONCE(*ptr, tmp); \
> +} while (0)
> +
> /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
> static inline int
> update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
> @@ -2887,15 +2901,15 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
>
> if (atomic_long_read(&cfs_rq->removed_load_avg)) {
> s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
> - sa->load_avg = max_t(long, sa->load_avg - r, 0);
> - sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
> + sub_positive(long, &sa->load_avg, r);
> + sub_positive(s64, &sa->load_sum, r * LOAD_AVG_MAX);

Hmm, so either we should change these variables to signed types as
forced here, or this logic (along with the former) is plain wrong.

As it stands any unsigned value with the MSB set will wipe the field
after this subtraction.

I suppose instead we'd want something like:

tmp = READ_ONCE(*ptr);
if (tmp > val)
tmp -= val;
else
tmp = 0;
WRITE_ONCE(*ptr, tmp);

In order to generate:

xchg %rax,0xa0(%r13)
mov 0x78(%r13),%rcx
sub %rax,%rcx
cmovae %r15,%rcx
mov %rcx,0x78(%r13)

however, GCC isn't smart enough and generates:

xchg %rax,0x98(%r13)
mov 0x70(%r13),%rsi
mov %rsi,%rcx
sub %rax,%rcx
cmp %rsi,%rax
cmovae %r15,%rcx
mov %rcx,0x70(%r13)

Doing a CMP with the _same_ values it does the SUB with, resulting in
exactly the same CC values.

(this is with gcc-5.3, I'm still trying to build gcc-6.1 from the debian
package which I suppose I should just give up and do a source build)

Opinions?