[RFC] 2.5.X semaphore starvation fix^H^H^Hwork-around

From: Mike Galbraith (efault@gmx.de)
Date: Wed May 07 2003 - 06:00:04 EST


Greetings Folks,

As some of you know, I've been muttering about the behavior of the new
scheduler, and discovered that the priority boost tasks receive via
sleeping on a contended lock is my main problem. This can lead to a whole
string of tasks which have been promoted above _other_ lock holders,
effectively starving them for an indeterminate amount of time with some
workloads. (make -j30 bzImage in ext3 fs is wonderful example, but the
problem appears to be generic, and is visible here in ext2 as well). With
ext3 on my up/preempt box, this leads to nearly zero concurrency for a make
-j30 bzImage.

The method I decided to try to defeat this problem is to take advantage of
those times when the scheduler would normally just return to the same task
(which the programmer asked to be rescheduled), and at this time, slip in a
lower priority task if possible to give a lower priority lock holder a
chance to release it's lock. For instance, once you have _one_ priority 17
task left, and it finished it's timeslice, instead of returning to that
task (or higher), select the stalest task from either the active array or
the expired array instead. It won't run for long, but _might_ have time to
release the lock. If you try the attached in ext3, you'll see that it's
pretty effective. It's still subject to a long queue at one priority, but
I figured I'd post this for now, and see what kind of comments/flames I get
back before doing any more experiments 8)

I also made sure that no task will keep the cpu without at least _trying_
to switch for a maximum of 50ms. This is a variant of Ingo's timeslice
split method, but doesn't introduce (isn't supposed to anyway;) any
unneeded context switches for friendly tasks.

Comments/flames welcome. (test reports as well)

        -Mike


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 07 2003 - 22:00:30 EST