Re: patch: blk-12

From: Andrew Morton (andrewm@uow.edu.au)
Date: Sat Dec 09 2000 - 21:57:35 EST


Jens Axboe wrote:
>
> o Batch freeing of requests. Stock kernels have very bad behaviour
> under I/O load (load here meaning that the request list is empty,
> doesn't require much effort...), because as soon as a request is
> completed and put back on the freelist, a read/write will grab it
> and the queue will be unplugged again. This effectively disables
> elevator merging efforts completely. Note -- even though wakeups
> of wait_for_request are now not a 1-1 mapping, wake-one semantics
> are maintained.
>

Exclusive waitqueues do not give wake-one semantics. They give
us wake-0.999999999 semantics :)

 1: static struct request *__get_request_wait(request_queue_t *q, int rw)
 2: {
 3: register struct request *rq;
 4: DECLARE_WAITQUEUE(wait, current);
 5:
 6: add_wait_queue_exclusive(&q->wait_for_request, &wait);
 7: for (;;) {
 8: __set_current_state(TASK_UNINTERRUPTIBLE);
 9: spin_lock_irq(&io_request_lock);
10: rq = get_request(q, rw);
11: spin_unlock_irq(&io_request_lock);
12: if (rq)
13: break;
14: generic_unplug_device(q);
15: schedule();
16: }
17: remove_wait_queue(&q->wait_for_request, &wait);
18: current->state = TASK_RUNNING;
19: return rq;
20: }

Suppose there are two tasks on the waitqueue. Someone does a wakeup.
One task wakes, loops around to line 8. It sets TASK_UNINTERRUPTIBLE
and is now eligible for another wakeup. So if someone does a wakeup
on the waitqueue while this task is running statements 9, 10, 11, 12,
13 and 17, that wakeup will be sent to the newly woken task.

The net effect: two tasks are sleeping, there were two wakeups but
only one task woke up.

Is this a problem in practice? Presumably not. This is probably
because the waitqueue is getting wakeups sent to it all the time,
and we muddle through.

One place where this race _would_ bite is in semaphores. __down()
has the same race, but it does an extra wake_up() on the queue after it
has taken itself off, which fixes things. So `up()' can in fact be
wake-two. err, make that wake-1.999999999.

The moral is: be careful with exclusive waitqueues. They're
a minefield.

There's another interesting race with exclusive waitqueues. In
a few places in the kernel a task can be on two exclusive waitqueues
at the same time. If two CPUs simutaneously direct a wakeup to
the two waitqueues they can race and the net effect is that
they both wake the same task. Depending on the behaviour
of the woken task when it returns to the outer waitloop that
can be a lost wakeup.

My proposal to fix this with a global spinlock in __wake_up_common
got nixed at Penguin HQ. Instead I have a patch lying around here
somewhere which makes wake_up_process() return a success value. If
__wake_up_common tries to wake an exclusive process but fails, it
keeps looking for someone to wake up.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Dec 15 2000 - 21:00:18 EST