[PATCH] eventfd signal race in aio_complete()

From: Jeff Roberson
Date: Fri Mar 07 2008 - 22:03:13 EST


Hello,

I have an application that makes use of eventfd to merge socket and aio blocking with epoll in one thread. Under heavy loads the application sometimes hangs when we receive notification from epoll that the eventfd has an event ready but reading the aio completions produces no results. Further investigation revealed that the aiocb was later ready with no new event and completing it based on a timer resolved the application hang.

This pointed to the eventfd being signaled prematurely and I verified that this was indeed the problem. aio_complete() calls eventfd_signal() before the event is actually placed on the completion ring. On a multi-processor system it is possible to read the event from epoll and return to userspace before aio_complete() finishes.

The enclosed patch simply moves the signaling to the bottom of the function. I'm not 100% familiar with this code and it looks like it may be possible to have spurious wakeups now but there will be no missed wakeups. An application may also race the other way now and receive aio completion before the signal, thus still leaving it with a signal with no completion. signaling while the kioctx is locked would resolve this but I was hesitant to introduce further nesting of spinlocks that might have another order elsewhere.

Please keep me in the cc line for any necessary replies.

Thanks,
Jeff

Signed-off-by: Jeff Roberson <jeff@xxxxxxxxxxx>--- aio.c.orig 2008-03-08 00:23:50.000000000 +0000
+++ aio.c 2008-03-08 00:24:32.000000000 +0000
@@ -946,14 +946,6 @@ int fastcall aio_complete(struct kiocb *
return 1;
}

- /*
- * Check if the user asked us to deliver the result through an
- * eventfd. The eventfd_signal() function is safe to be called
- * from IRQ context.
- */
- if (!IS_ERR(iocb->ki_eventfd))
- eventfd_signal(iocb->ki_eventfd, 1);
-
info = &ctx->ring_info;

/* add a completion event to the ring buffer.
@@ -1010,6 +1002,15 @@ put_rq:
wake_up(&ctx->wait);

spin_unlock_irqrestore(&ctx->ctx_lock, flags);
+
+ /*
+ * Check if the user asked us to deliver the result through an
+ * eventfd. The eventfd_signal() function is safe to be called
+ * from IRQ context.
+ */
+ if (!IS_ERR(iocb->ki_eventfd))
+ eventfd_signal(iocb->ki_eventfd, 1);
+
return ret;
}