Re: Revert "aio: fix aio request leak when events are reaped by user space"

From: Benjamin LaHaise
Date: Tue Aug 19 2014 - 20:46:59 EST


On Tue, Aug 19, 2014 at 08:14:26PM +0300, Dan Aloni wrote:
> On Tue, Aug 19, 2014 at 12:54:04PM -0400, Benjamin LaHaise wrote:
> > On Tue, Aug 19, 2014 at 07:37:33PM +0300, Dan Aloni wrote:
> > > Some testing I've done today indicates that the original commit broke
> > > AIO with regard to users that overflow the maximum number of request
> > > per IO context (where -EAGAIN is returned).
> > >
> > > In fact, it did worse - the attached C program can easily overrun the
> > > ring buffer that is responsible for managing the completed requests,
> > > and caused notification about their completion never to be returned.
> >
> > Argh, that would be a problem.
> >
> > ...
> > > This reverts commit b34e0e1319b31202eb142dcd9688cf7145a30bf6.
> >
> > Reverting isn't okay, as that reintroduces another regression. We need
> > to come up with a fix for this issue that doesn't reintroduce the other
> > regression for events reaped in user space. Let me have a look and see
> > what I can come up with...
>
> About the original regression you mention, is there a program you can
> indicate that reproduces it? On my setups, the regression testing in
> libaio was not able to detect the current regression too.

You can trigger the behaviour with fio by using userspace event reaping.
Adding a test case for that behaviour to libaio would be a good idea.

I thought about how to fix this, and it isn't actually that hard. Move
the put_reqs_available() call back into event consumption, and then add
code in the submit path to call put_reqs_available() if the system runs
out of events by noticing that there is free space in the event ring.
Something along the lines below should do it (please note, this is
completely untested!). I'll test and polish this off tomorrow, as it's
getting a bit late here.

-ben
--
"Thought is the essence of where you are now."

diff --git a/fs/aio.c b/fs/aio.c
index ae63587..749ac22 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -142,6 +142,7 @@ struct kioctx {
struct {
unsigned tail;
spinlock_t completion_lock;
+ unsigned completed_events;
} ____cacheline_aligned_in_smp;

struct page *internal_pages[AIO_RING_PAGES];
@@ -857,6 +858,31 @@ out:
return ret;
}

+static void refill_reqs_available(struct kioctx *ctx)
+{
+ spin_lock_irq(&ctx->completion_lock);
+ if (ctx->completed_events) {
+ unsigned head, tail, avail, completed;
+ struct aio_ring *ring;
+
+ ring = kmap_atomic(ctx->ring_pages[0]);
+ head = ACCESS_ONCE(ring->head);
+ tail = ACCESS_ONCE(ring->tail);
+ kunmap_atomic(ring);
+
+ avail = (head <= tail ? tail : ctx->nr_events) - head;
+ completed = ctx->completed_events;
+ if (avail < completed)
+ completed -= avail;
+ else
+ completed = 0;
+ put_reqs_available(ctx, completed);
+ }
+
+ spin_unlock_irq(&ctx->completion_lock);
+}
+
+
/* aio_get_req
* Allocate a slot for an aio request.
* Returns NULL if no requests are free.
@@ -865,8 +891,11 @@ static inline struct kiocb *aio_get_req(struct kioctx *ctx)
{
struct kiocb *req;

- if (!get_reqs_available(ctx))
- return NULL;
+ if (!get_reqs_available(ctx)) {
+ refill_reqs_available(ctx);
+ if (!get_reqs_available(ctx))
+ return NULL;
+ }

req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO);
if (unlikely(!req))
@@ -1005,7 +1034,6 @@ void aio_complete(struct kiocb *iocb, long res, long res2)

/* everything turned out well, dispose of the aiocb. */
kiocb_free(iocb);
- put_reqs_available(ctx, 1);

/*
* We have to order our ring_info tail store above and test
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/