[patch] wake_one for accept(2) [was Re: Overscheduling DOES happen

Andrea Arcangeli (andrea@e-mind.com)
Tue, 11 May 1999 02:24:28 +0200 (CEST)


On Thu, 6 May 1999, Linus Torvalds wrote:

>Well, it's even easier than the above. What you do is:
>
> - the wakeup-queue is a linked list (surprise, surprise, that's
> how it works already)
> - you add the "exclusive" entries to the end of the list (or at least
> after any oher non-exclusive ones: depending on how the list is
> organized one or the other may be the more efficient way to handle it).
> They are also marked some way - preferably just a bit in the task
> state.
> - when you do a wakeup, you do exactly what you do now: walk the list,
> waking up each process. The ONLY difference is that if you find an
> exclusive process (easy to test for: you already look at the task state
> anyway) that you woke up, you stop early and go home. You mark the one
> you woke non-exclusive _or_ you make the exclusivity test also verify
> that the thing is not running, so two consecutive wakeup calls will
> always wake up two exclusive processes.
>
>So:
> - nonexclusive waiters are always woken up. They are at the head of the
> list.
> - _one_ non-exclusive process is always woken up.

I implemented the thing you described above. It seems to work :).

To test it I developed a proggy that forks a child and then in the parent
pthread_create 60 pthread server. Each server does an accept(2) (blocking)
loop. The previously forked child instead does only a connect(2) loop.

Without the wake-one patch below in order to full all the
connecting/client tcp port I need many seconds of very high load of the
machine. With the patch appyed less then a second and the machine seems
not overloaded during such little time. I have not exact numbers though (I
measured only with my eyes so far ;).

I am not 100% sure the patch has no deadlock-conditions, but it looks like
safe. The wait_for_connect code first check if there is a connection
available before looking at signals.

Index: kernel/sched.c
===================================================================
RCS file: /var/cvs/linux/kernel/sched.c,v
retrieving revision 1.1.1.9
diff -u -r1.1.1.9 sched.c
--- linux/kernel/sched.c 1999/05/07 00:01:50 1.1.1.9
+++ linux/kernel/sched.c 1999/05/11 00:05:35
@@ -718,31 +674,20 @@
goto move_rr_last;
move_rr_back:

- switch (prev->state) {
- case TASK_INTERRUPTIBLE:
- if (signal_pending(prev)) {
- prev->state = TASK_RUNNING;
- break;
- }
- default:
- del_from_runqueue(prev);
- case TASK_RUNNING:
- }
- prev->need_resched = 0;
-
-repeat_schedule:
-
/*
* this is the scheduler proper:
*/
+ prev->need_resched = 0;

+repeat_schedule:
+ if (prev->state != TASK_RUNNING)
+ goto prev_not_runnable;
+repeat_schedule_runnable:
+ c = prev_goodness(prev, prev, this_cpu);
+ next = prev;
+prev_not_runnable_back:
+
p = init_task.next_run;
- /* Default process to select.. */
- next = idle_task(this_cpu);
- c = -1000;
- if (prev->state == TASK_RUNNING)
- goto still_running;
-still_running_back:

/*
* This is subtle.
@@ -836,13 +781,22 @@
p->counter = (p->counter >> 1) + p->priority;
read_unlock(&tasklist_lock);
spin_lock_irq(&runqueue_lock);
+ if (prev->state != TASK_RUNNING)
+ add_to_runqueue(prev);
goto repeat_schedule;
}

-still_running:
- c = prev_goodness(prev, prev, this_cpu);
- next = prev;
- goto still_running_back;
+prev_not_runnable:
+ if (prev->state & TASK_INTERRUPTIBLE && signal_pending(prev))
+ {
+ prev->state = TASK_RUNNING;
+ goto repeat_schedule_runnable;
+ }
+ del_from_runqueue(prev);
+ /* Default process to select.. */
+ next = idle_task(this_cpu);
+ c = -1000;
+ goto prev_not_runnable_back;

handle_bh:
do_bottom_half();
@@ -879,6 +833,7 @@
{
struct task_struct *p;
struct wait_queue *head, *next;
+ int wake_one = 0;

if (!q)
goto out;
@@ -897,6 +852,13 @@
p = next->task;
next = next->next;
if (p->state & mode) {
+ if (p->state & TASK_WAKE_ONE)
+ {
+ if (wake_one)
+ continue;
+ p->state &= ~TASK_WAKE_ONE;
+ wake_one = 1;
+ }
/*
* We can drop the read-lock early if this
* is the only/last process.
@@ -1198,7 +1158,7 @@
read_lock(&tasklist_lock);
for_each_task(p) {
if ((p->state == TASK_RUNNING ||
- p->state == TASK_UNINTERRUPTIBLE ||
+ p->state & TASK_UNINTERRUPTIBLE ||
p->state == TASK_SWAPPING))
nr += FIXED_1;
}
Index: kernel/signal.c
===================================================================
RCS file: /var/cvs/linux/kernel/signal.c,v
retrieving revision 1.1.1.3
diff -u -r1.1.1.3 signal.c
--- linux/kernel/signal.c 1999/05/07 00:01:50 1.1.1.3
+++ linux/kernel/signal.c 1999/05/10 23:12:04
@@ -387,7 +387,7 @@

out:
spin_unlock_irqrestore(&t->sigmask_lock, flags);
- if (t->state == TASK_INTERRUPTIBLE && signal_pending(t))
+ if (t->state & TASK_INTERRUPTIBLE && signal_pending(t))
wake_up_process(t);

out_nolock:
Index: net/ipv4/tcp.c
===================================================================
RCS file: /var/cvs/linux/net/ipv4/tcp.c,v
retrieving revision 1.1.1.6
diff -u -r1.1.1.6 tcp.c
--- linux/net/ipv4/tcp.c 1999/04/28 20:46:58 1.1.1.6
+++ linux/net/ipv4/tcp.c 1999/05/10 23:56:51
@@ -1575,7 +1575,7 @@

add_wait_queue(sk->sleep, &wait);
for (;;) {
- current->state = TASK_INTERRUPTIBLE;
+ current->state = TASK_INTERRUPTIBLE | TASK_WAKE_ONE;
release_sock(sk);
schedule();
lock_sock(sk);
Index: include/linux/sched.h
===================================================================
RCS file: /var/cvs/linux/include/linux/sched.h,v
retrieving revision 1.1.1.7
diff -u -r1.1.1.7 sched.h
--- linux/include/linux/sched.h 1999/05/07 00:01:33 1.1.1.7
+++ linux/include/linux/sched.h 1999/05/11 00:10:01
@@ -79,6 +79,7 @@
#define TASK_ZOMBIE 4
#define TASK_STOPPED 8
#define TASK_SWAPPING 16
+#define TASK_WAKE_ONE 32

/*
* Scheduling policies

(as usual there is also a not too much related change in my patch ;), this
time it's in the scheduler: the case where prev is still runnable was
considered a slow-path in pre-2.2.8-5, while instead it's the fast-path so
I included the fix in the patch because I was just forced to change a bit
such part of code to takes care of the WAKE_ONE bit).

The patch above will apply correctly against pre-2.2.8-5 (really here I
just killed also the TASK_SWAPPING state from ages but I removed such
change to show only the worthwile changes).

I am not sure I've understood well the problem and if the real issue was
the case where many threads are sleeping in accept(2) and then at the
first connection all the threads are getting wakenup but only the first
scheduled one will really become the peer of the connection. The reason I
am not sure is that here apache seems to issue an accept only in one task
and flock on all other brother-tasks... (so to get the improvement with my
current apache we should make flock wake-one instead of accept...).

Comments?

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/