Potential 2.2.8 scheduler bugs

Dimitris Michailidis (dimitris@darkside.engr.sgi.com)
Wed, 12 May 1999 11:35:57 -0700 (PDT)


There appear to be a few bugs in the 2.2.8 scheduler on MP systems. The
first has to do with reschedule_idle_slow. Suppose
there are two CPUS 0 and 1 running time shared threads. At some point two
RT threads become runnable and go through reschedule_idle_slow one after the
other. It is possible that both threads will choose to run on CPU 0. When
CPU 0 finally reschedules it will pick the higher priority RT task and leave
the other one on the run queue. While this RT task is waiting on the run
queue, CPU 1 continues executing its low priority time shared thread. The
problem here is that reschedule_idle_slow allows several high priority
threads to select the same CPU as a target and this CPU can only pick one
of these threads. No attempt is made to reschedule the high priority
threads that didn't get chosen by their desired CPU, even though these
threads may be able to run on other CPUs.

The second problem has to do with the counter recalculation. The current
implementation suffers from race conditions. Suppose that CPU 0 decides it
needs to recalculate the counters. When it drops the runqueue lock and
before it gets the chance to recalculate the counters, CPU 1 may also notice
that the counters need recalculation. There is a window between the
time a processor notices that the counters must be recalculated and the time
it actually does it, and during this window other CPUs may notice the same
thing. This results in the counters being recalculated many times in
succession.

On another front, release() in exit.c contains the following piece of code:

for (;;) {
int has_cpu;
spin_lock_irq(&runqueue_lock);
has_cpu = p->has_cpu;
spin_unlock_irq(&runqueue_lock);
if (!has_cpu)
break;
do {
barrier();
} while (p->has_cpu);
}

My understanding is that this code exists to make sure that a zombie thread
has a chance to deschedule itself before its task_struct is reclaimed. If
this is the case we are just waiting for has_cpu to transition from 1 to 0.
Why isn't it enough to just write

while (p->has_cpu)
barrier();

Why do we need the lock and why do we need the outer loop?

-----
Dimitris Michailidis dimitris@engr.sgi.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/