Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()

From: Morten Rasmussen
Date: Thu Jul 09 2015 - 10:29:47 EST


On Mon, Jul 06, 2015 at 06:31:44AM +0800, Yuyang Du wrote:
> On Fri, Jul 03, 2015 at 06:38:31PM +0200, Peter Zijlstra wrote:
> > > I'm not against having a policy that sits somewhere in between, we just
> > > have to agree it is the right policy and clean up the load-balance code
> > > such that the implemented policy is clear.
> >
> > Right, for balancing its a tricky question, but mixing them without
> > intent is, as you say, a bit of a mess.
> >
> > So clearly blocked load doesn't make sense for (new)idle balancing. OTOH
> > it does make some sense for the regular periodic balancing, because
> > there we really do care mostly about the averages, esp. so when we're
> > overloaded -- but there are issues there too.
> >
> > Now we can't track them both (or rather we could, but overhead).
> >
> > I like Yuyang's load tracking rewrite, but it changes exactly this part,
> > and I'm not sure I understand the full ramifications of that yet.

I don't think anybody does ;-) But I think we should try to make it
work.

> Thanks. It would be a pure average policy, which is non-perfect like now,
> and certainly needs a mixing like now, but it is worth a starter, because
> it is simple and reasaonble, and based on it, the other parts can be simple
> and reasonable.

I think we all agree on the benefits of taking blocked load into
account but also that there are some policy questions to be addressed.

> > One way out would be to split the load balancer into 3 distinct regions;
> >
> > 1) get a task on every CPU, screw everything else.
> > 2) get each CPU fully utilized, still ignoring 'load'
> > 3) when everybody is fully utilized, consider load.

Seems very reasonable to me. We more or less follow that idea in the
energy-model driven scheduling patches, at least 2) and 3).

The difficult bit is detecting when to transition between 2) and 3). If
you want to enforce smp_nice you have to start worrying about task
priority as soon as one cpu is fully utilized.

For example, a fully utilized cpu has two high priority tasks while all
other cpus are running low priority tasks and are not fully utilized.
The utilization imbalance may be too small to cause any tasks to be
migrated, so we end up giving fewer cycles to the high priority tasks.

> > If we make find_busiest_foo() select one of these 3, and make
> > calculate_imbalance() invariant to the metric passed in, and have things
> > like cpu_load() and task_load() return different, but coherent, numbers
> > depending on which region we're in, this almost sounds 'simple'.
> >
> > The devil is in the details, and the balancer is a hairy nest of details
> > which will make the above non-trivial.

Yes, but if we have an overall policy like the one you propose we can at
least make it complicated and claim that we think we know what it is
supposed to do ;-)

I agree that there is some work to be done in find_busiest_*() and
calcuate_imbalance() + friends. Maybe step one should be to clean them
up a bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/