Re: sched: odd values for effective load calculations

From: Peter Zijlstra
Date: Mon Dec 15 2014 - 07:12:48 EST



Sorry for the long delay, I was out for a few weeks due to having become
a dad for the second time.

On Sat, Dec 13, 2014 at 09:30:12AM +0100, Ingo Molnar wrote:
> * Sasha Levin <levinsasha928@xxxxxxxxx> wrote:
>
> > Hi all,
> >
> > I was fuzzing with trinity inside a KVM tools guest, running the latest -next
> > kernel along with the undefined behaviour sanitizer patch, and hit the following:
> >
> > [ 787.894288] ================================================================================
> > [ 787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
> > [ 787.898981] signed integer overflow:
> > [ 787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'

So that's:

this_eff_load *= this_load +
effective_load(tg, this_cpu, weight, weight);

Going by the numbers the 101500 must be 'this_eff_load', 100 * ~1024
makes that. Which makes the rhs 'large'. Do you have
CONFIG_FAIR_GROUP_SCHED enabled? If so, what kind of cgroup hierarchy
are you using?

In any case, bit sad this doesn't have a register dump included :/

Is this easy to reproduce or something that happened once?

> > The values for effective load seem a bit off (and are overflowing!).
>
> It definitely looks like a bug in SMP load balancing!

Yeah, although theoretically (and somewhat practical) this can be
triggered in more places if you manage to run up the 'weight' with
enough tasks.

That said, it should at worst result in 'funny' balancing behaviour, not
anything else.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/