Re: Linux 2.2.19pre2

From: Andrew Morton (andrewm@uow.edu.au)
Date: Sun Dec 24 2000 - 00:17:10 EST


Andrea Arcangeli wrote:
>
> On Sun, Dec 24, 2000 at 01:28:59PM +1100, Andrew Morton wrote:
> > This could happen with the old scheme where exclusiveness
> > was stored in the task, not the waitqueue.
> >
> > >From test4:
> >
> > for (;;) {
> > __set_current_state(TASK_UNINTERRUPTIBLE | TASK_EXCLUSIVE);
> > /* WINDOW */
> > spin_lock_irq(&io_request_lock);
> > rq = get_request(q, rw);
> > spin_unlock_irq(&io_request_lock);
> > if (rq)
> > break;
> > generic_unplug_device(q);
> > schedule();
> > }
> >
> > As soon as we set TASK_UNINTERRUPTIBLE|TASK_EXCLUSIVE, this
> > task becomes visible to __wake_up_common and can soak
> > up a second wakeup. [..]
>
> I disagree, none wakeup could be lost in test4 in the above section of code
> (besides the __set_current_state that should be set_current_state ;) and that's
> not an issue for both x86 and alpha where spin_lock is a full memory barrier).
>
> Let's examine what happens in test4:
>
> [ snip ]

I was talking about a different scenario:

        add_wait_queue_exclusive(&q->wait_for_request, &wait);
        for (;;) {
                __set_current_state(TASK_UNINTERRUPTIBLE);
        /* WINDOW */
                spin_lock_irq(&io_request_lock);
                rq = get_request(q, rw);
                spin_unlock_irq(&io_request_lock);
                if (rq)
                        break;
                generic_unplug_device(q);
                schedule();
        }
        remove_wait_queue(&q->wait_for_request, &wait);

Suppose there are two tasks sleeping in the schedule().

A wakeup comes. One task wakes. It loops aound and reaches
the window. At this point in time, another wakeup gets sent
to the waitqueue. It gets directed to the task which just
woke up! It should have been directed to a sleeping task,
not the current one which just *looks* like it's sleeping,
because it's in state TASK_UNINTERRUPTIBLE.

This can happen in test4.

> This race is been introduced in test1X because there the task keeps asking for
> an exclusive wakeup even after getting a wakeup: until it has the time to
> cleanup and unregister explicitly.

No, I think it's the same in test4 and test1X. In current kernels
it's still the case that the woken task gets atomically switched into
state TASK_RUNNING, and then becomes "hidden" from the wakeup code.

The problem is, the task bogusly sets itself back into
TASK_UNINTERRUPTIBLE for a very short period and becomes
eligible for another wakeup.

There's not much which ll_rw_blk.c can do about this.

> And btw, 2.2.19pre3 has the same race, while the alternate wake-one
> implementation in 2.2.18aa2 doesn't have it (for the same reasons).

The above scenario can happen in all kernels.

> And /* WINDOW */ is not the only window for such race: the window is the whole
> section where the task is registered in the waitqueue in exclusive mode:
>
> add_wait_queue_exclusive
> /* all is WINDOW here */
> remove_wait_queue
>
> Until remove_wait_queue runs explicitly in the task context any second wakeup
> will keep waking up only such task (so in turn it will be lost). So it should
> trigger very easily under high load since two wakeups will happen much faster
> compared to the task that needs to be rescheduled in order run
> remove_wait_queue and cleanup.
>
> Any deadlocks in test1X could be very well explained by this race condition.

Yes, but I haven't seen a "stuck in D state" report for a while.
I assume this is because this waitqueue gets lots of wakeups sent to it.

The ones in networking I expect are protected by other locking
which serialises the wakeups.

The one in the x86 semaphores avoids it by sending an extra wakeup
to the waitqueue on the way out.

> > Still, changing the wakeup code in the way we've discussed
> > seems the way to go. [..]
>
> I agree. I'm quite convinced it's right way too and of course I see why we
> moved the exclusive information in the waitqueue instead of keeping it in the
> task struct to provide sane semantics in the long run ;).

Linus suggested at one point that we clear the waitqueue's
WQ_FLAG_EXCLUSIVE bit when we wake it up, so it falls back
to non-exclusive mode. This was to address one of the
task-on-two-waitqueues problems...

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Dec 31 2000 - 21:00:07 EST