Re: [RFC] sched: CPU topology try

From: Morten Rasmussen
Date: Wed Jan 08 2014 - 07:35:42 EST


On Tue, Jan 07, 2014 at 08:49:51PM +0000, Peter Zijlstra wrote:
> On Tue, Jan 07, 2014 at 03:41:54PM +0000, Morten Rasmussen wrote:
> > I think that could work if we sort of the priority scaling issue that I
> > mentioned before.
>
> We talked a bit about this on IRC a month or so ago, right? My memories
> from that are that your main complaint is that we don't detect the
> overload scenario right.
>
> That is; the point at which we should start caring about SMP-nice is
> when all our CPUs are fully occupied, because up to that point we're
> under utilized and work preservation mandates we utilize idle time.

Yes. I think I stated the problem differently, but I think we talk about
the same thing. Basically, priority-scaling in task load_contrib means
that runnable_load_avg and blocked_load_avg are poor indicators of cpu
load (available idle time). Priority scaling only makes sense when the
system is fully utilized. When that is not the case, it just gives us a
potentially very inaccurate picture of the load (available idle time).

Pretty much what you just said :-)

> Currently we detect overload by sg.nr_running >= sg.capacity, which can
> be very misleading because while a cpu might have a task running 'now'
> it might be 99% idle.
>
> At which point I argued we should change the capacity thing anyhow. Ever
> since the runnable_avg patch set I've been arguing to change that into
> an actual utilization test.
>
> So I think that if we measure overload by something like >95% utilization
> on the entire group the load scaling again makes perfect sense.

I agree that it make more sense to change the overload test to be based
on some tracked load. How about the non-overloaded case? Load balancing
would have to be based on unweighted task loads in that case?

>
> Given the 3 task {A,B,C} workload where A and B are niced, to land on a
> symmetric dual CPU system like: {A,B}+{C}, assuming they're all while(1)
> loops :-).
>
> The harder case is where all 3 tasks are of equal weight; in which case
> fairness would mandate we (slowly) rotate the tasks such that they all
> get 2/3 time -- we also horribly fail at this :-)

I have encountered that one a number of times. All the middleware noise
in Android sometimes give that effect.

I'm not sure if the NUMA guy would like rotating scheduler though :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/