Re: [patch] sys_epoll ...

From: Andrew Morton (akpm@digeo.com)
Date: Sun Oct 20 2002 - 21:01:20 EST


Davide Libenzi wrote:
>
> On Sun, 20 Oct 2002, Andrew Morton wrote:
>
> > + if (ep->eventcnt || !timeout)
> > + break;
> > + if (signal_pending(current)) {
> > + res = -EINTR;
> > + break;
> > + }
> > +
> > + set_current_state(TASK_INTERRUPTIBLE);
> > +
> > + write_unlock_irqrestore(&ep->lock, flags);
> > + timeout = schedule_timeout(timeout);
> >
> > You should set current->state before performing the checks.
>
> Why this Andrew ?
>

Well I'm assuming that you don't want to sleep if, say,
ep->eventcnt is non-zero. The code is currently (simplified):

        add_wait_queue(...);
        if (ep->eventcnt)
                break;
        /* window here */
        set_current_state(TASK_INTERRUPTIBLE);
        schedule();

If another CPU increments eventcnt and sends this task a wakeup in that
window, it is missed and we still sleep. The conventional fix for that
is:

        add_wait_queue(...);
        set_current_state(TASK_INTERRUPTIBLE);
        if (ep->eventcnt)
                break;
        /* harmless window here */
        schedule();

So if someone delivers a wakeup in the "harmless window" then this task
still calls schedule(), but the wakeup has turned the state from
TASK_INTERRUPTIBLE into TASK_RUNNING, so the schedule() doesn't actually
take this task off the runqueue. This task will zoom straight through the
schedule() and will then loop back and notice the incremented ep->eventcnt.

So it is important that the waker increment eventcnt _before_ delivering
the wake_up, too.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Oct 23 2002 - 22:00:51 EST