The second problem has to do with the counter recalculation. The current
implementation suffers from race conditions. Suppose that CPU 0 decides it
needs to recalculate the counters. When it drops the runqueue lock and
before it gets the chance to recalculate the counters, CPU 1 may also notice
that the counters need recalculation. There is a window between the
time a processor notices that the counters must be recalculated and the time
it actually does it, and during this window other CPUs may notice the same
thing. This results in the counters being recalculated many times in
succession.
On another front, release() in exit.c contains the following piece of code:
for (;;) {
int has_cpu;
spin_lock_irq(&runqueue_lock);
has_cpu = p->has_cpu;
spin_unlock_irq(&runqueue_lock);
if (!has_cpu)
break;
do {
barrier();
} while (p->has_cpu);
}
My understanding is that this code exists to make sure that a zombie thread
has a chance to deschedule itself before its task_struct is reclaimed. If
this is the case we are just waiting for has_cpu to transition from 1 to 0.
Why isn't it enough to just write
while (p->has_cpu)
barrier();
Why do we need the lock and why do we need the outer loop?
-----
Dimitris Michailidis dimitris@engr.sgi.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/