Re: [PATCH] Fix longstanding load balancing bug in the scheduler.

From: Nick Piggin
Date: Thu Sep 07 2006 - 17:45:54 EST


On Thu, Sep 07, 2006 at 10:24:26AM -0700, Christoph Lameter wrote:
> Hmmm... Some more comments
>
> On Thu, 7 Sep 2006, Nick Piggin wrote:
>
> > So what I worry about with this approach is that it can really blow
> > out the latency of a balancing operation. Say you have N-1 CPUs with
> > lots of stuff locked on their runqueues.
>
> Right but that situation is rare and the performance is certainly
> better if the unpinned processes are not all running on the same
> processor.

But the situation is rare full stop otherwise we'd have had more
complaints. And we do tend to care about theoretical max latency
in many other obscure paths than the scheduler.

How about if you have N/2 CPUs with lots of stuff on runqueues?
Then the other N/2 will each be scanning N/2+1 runqueues... that's
a lot of bus traffic going on which you probably don't want.

>
> > The solution I envisage is to do a "rotor" approach. For example
> > the last attempted CPU could be stored in the starving CPU's sd...
> > and it will subsequently try another one.
>
> That wont work since the notion of "pinned" is relative to a cpu.
> A process may be pinned to a group of processors. It will only appear to
> be pinned for cpus outside that set of processors!

A sched-domain is per-CPU as well. Why doesn't it work?

> What good does storing a processor number do if the processes can change
> dynamically and so may the pinning of processors. We are dealing with
> a set of processes. Each of those may be pinned to a set of processors.

Yes, it isn't going to be perfect, but it would at least work (which
it doesn't now) without introducing that latency.

>
> > I've been hot and cold on such an implementation for a while: on one
> > hand it is a real problem we have; OTOH I was hoping that the domain
> > balancing might be better generalised. But I increasingly don't
> > think we should let perfect stand in the way of good... ;)
>
> I think we should fix the problem ASAP. Lets do this fix and then we can
> think about other solutions. You already had lots of time to think about
> the rotor.

The problem has been known about for many years. It doesn't remain
unfixed because we didn't know about this brute force patch ;)
Given that Ingo also maintains the -RT patchset, it might face a
rocky road... but if he is OK with it then OK by me for now.

> This looks to me as a design flaw that would require either a major rework
> of the scheduler or (my favorite) a delegation of difficult (and
> computational complex and expensive) scheduler decisions to user space.

I'd be really happy to move this to userspace :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/