Re: 2.4.4 sluggish under fork load

From: Andrea Arcangeli (andrea@suse.de)
Date: Tue May 01 2001 - 11:55:17 EST


On Tue, May 01, 2001 at 07:18:49AM +0200, Andrea Arcangeli wrote:
> I'm running with this below patch applied since a some time (I didn't
> submitted it because for some reason unless I do p->policy &=
> ~SCHED_YIELD ksoftirqd deadlocks at boot and I didn't yet investigated
> why, and I'd like to have the whole picture on it first):

OK I found the explanation now. The reason ksoftirqd was deadlocking on
me without the explicit clear of SCHED_YIELD in p->policy is because a
softirq event was pending at the time of the first kernel_thread() and
then while returning from the syscall it was so taking the ret_from_irq
path that skips the reschedule [which was supposed to clear the
sched_yield and to reschedule the child] because CS was pointing to the
kernel descriptor. So init then runs with SCHED_YIELD set and when it
executes kernel_thread(ksoftirqd) also ksoftirqd inherit SCHED_YIELD set
too (copied at top of do_fork) and it never gets scheduled -> deadlock.

Basically there's no guarantee that any kernel_thread will return with
SCHED_YIELD cleared.

And if you fork off a child with its p->policy SCHED_YIELD set it will
never get scheduled in.

Only "just" running tasks can have SCHED_YIELD set.

So the below lines are the *right* and most robust approch as far I can
tell. (plus counter needs to be volatile, as every variable that can
change under the C code, even while it's probably not required by the
code involved with current->counter)

> + {
> + int counter = current->counter >> 1;
> + current->counter = p->counter = counter;
> + p->policy &= ~SCHED_YIELD;
> + current->policy |= SCHED_YIELD;
> + current->need_resched = 1;
> + }

Alan, the patch you merged in 2.4.4ac2 can fail like mine, but it may fail in
a much more subtle way, while I notice if ksoftirqd never get scheduled
because I synchronize on it and I deadlock, your kupdate/bdflush/kswapd
may be forked off correctly but they can all have SCHED_YIELD set and
they will *never* get scheduled. You know what can happen if kupdate
never gets scheduled... I recommend to be careful with 2.4.4ac2.

My patch (part of it quoted above) is the right replacement for the code
in 2.4.4ac2 (you may want to do `counter = current->counter + 1 >> 1'
tricks additionally to that, I will change it a bit too for that minor
part.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon May 07 2001 - 21:00:10 EST