Re: [PATCH] sched/fair: fix mul overflow on 32-bit systems

From: Morten Rasmussen
Date: Mon Dec 14 2015 - 09:46:30 EST


On Mon, Dec 14, 2015 at 03:20:21PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 14, 2015 at 01:07:26PM +0000, Morten Rasmussen wrote:
>
> > Agreed, >100% is a transient state (which can be rather long) which only
> > means over-utilized, nothing more. Would you like the metric itself to
> > be changed to saturate at 100% or just cap it to 100% when used?
>
> We already cap it when using it IIRC. But no, I was thinking of the
> measure itself.

Yes, okay.

>
> > It is not straight forward to provide a bound on the sum.
>
> Agreed..
>
> > There isn't one for load_avg either.
>
> But that one is fundamentally unbound, whereas the util thing is
> fundamentally bound, except our implementation isn't.

Agreed.

>
> > If we want to guarantee an upper bound for
> > cfs_rq->avg.util_sum we have to somehow cap the se->avg.util_avg
> > contributions for each sched_entity. This cap depends on the cpu and how
> > many other tasks are associated with that cpu. The cap may have to
> > change when tasks migrate.
>
> Yep, blows :-)
>
> > > However, I think that makes sense, but would propose doing it
> > > differently. That condition is generally a maximum (assuming proper
> > > functioning of the weight based scheduling etc..) for any one task, so
> > > on migrate we can hard clip to this value.
>
> > Why use load.weight to scale util_avg? It is affected by priority. Isn't
> > just the ratio 1/nr_running that you are after?
>
> Remember, the util thing is based on running, so assuming each task
> always wants to run, each task gets to run w_i/\Sum_j w_j due to CFS
> being a weighted fair queueing thingy.

Of course, yes.

>
> > IIUC, you propose to clip the sum itself. In which case you are running
> > into trouble when removing tasks. You don't know how much to remove from
> > the clipped sum.
>
> Right, then we'll have to slowly gain it again.

If you have a seriously over-utilized cpu and migrate some of the tasks
to a different cpu the old cpu may temporarily look lightly utilized
even if we leave some big tasks behind. That might lead us to trouble if
we start using util_avg as the basis for cpufreq decisions. If we care
about performance, the safe choice is to consider an cpu over-utilized
still over-utilized even after we have migrated tasks away. We can only
trust that the cpu is no longer over-utilized when cfs_rq->avg.util_avg
'naturally' goes below 100%. So from that point of view, it might be
better to let it stay 100% and let it sort itself out.

> > Another problem is that load.weight is just a snapshot while
> > avg.util_avg includes tasks that are not currently on the rq so the
> > scaling factor is probably bigger than what you want.
>
> Our weight guestimates also include non running (aka blocked) tasks,
> right?

The rq/cfs_rq load.weight doesn't. It is updated through
update_load_{add,sub}() in account_entity_{enqueue,dequeue}(). So only
runnable+running tasks I think.

> > If we leave the sum as it is (unclipped) add/remove shouldn't give us
> > any problems. The only problem is the overflow, which is solved by using
> > a 64bit type for load_avg. That is not an acceptable solution?
>
> It might be. After all, any time any of this is needed we're CPU bound
> and the utilization measure is pointless anyway. That measure only
> matters if its small and the sum is 'small'. After that its back to the
> normal load based thingy.

Yes, agreed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/