Re: Revert "aio: fix aio request leak when events are reaped by user space"

From: Benjamin LaHaise
Date: Sun Aug 24 2014 - 14:05:47 EST

On Fri, Aug 22, 2014 at 09:51:10PM +0300, Dan Aloni wrote:
> Ben, seems that the test program needs some twidling to make the bug
> appear still by setting MAX_IOS to 256 (and it still passes on a
> kernel with the original patch reverted). Under this condition the
> ring buffer size remains 128 (here, 32*4 CPUs), and it is overrun on
> the second attempt.

Thanks for checking that. I've made some further changes to your test
program to be able to support both userspace event reaping and using the
io_getevents() syscall. I also added a --max-ios= parameter to allow
using different values of MAX_IOS for testing. A copy of the modified
source is available at .
With this version of the patch, both means of reaping events now work
corectly. The aio_bug.c code still needs to be added to libaio's set
of tests, but that's a separate email.

> I think I have found two problems with your patch: first, the
> completed_events field is never decremented so it goes up until 2^32
> wraparound. So I tested with 'ctx->completed_events -= completed;'
> there (along with some prints), but testing revealed that this didn't
> solve the problem, so secondly, I also fixed the 'avail = ' line. The
> case where the 'head > tail' case didn't look correct to me.

Yeah, sorry for not checking that. I didn't have time to sit down for a
couple of hours to test and verify it until today. Your changes for
handling ring wrap around correctly look good after inspecting the code
and how it reacts to the various conditions.

> So the good news is that it works now with fix below and MAX_IOS=256
> and even with MAX_IOS=512. You can git-amend this it to your patch I
> guess.

I ended up making a bunch of changes to the patch to ensure the user
space tainted value of ring->head is clamped. I also made the code use
ctx->tail to avoid having to do a mod operation on the value of tail as
well. The code is rearranged to optimistically call refill_reqs_available()
in aio_complete() to try to avoid doing this only at io_submit() time.
This helps to avoid having to perform refill from the io_submit() path,
as io_submit() does not otherwise have to take ctx->completion_lock
and potentially bounce that cache line around.

If you can have a quick look and acknowledge with your Signed-off-by, I'll
add it to the commit and send a pull request. With these changes and the
modified aio_bug.c, the test passes using various random values for
max_ios, as well as both with and without --user_getevents validating that
both regressions involved have now been fixed.

> BTW, I am not an expert on this code so I am still not sure that
> 'ctx->completed_events' wouldn't get wrong if for instance - one SMP
> core does userspace event reaping and another calls io_submit(). Do
> you think it would work alright in that case?

This should be SMP safe: both ctx->completed_events and ctx->tail are
protected ctx->completion_lock. Additionally, possible races with
grabbing the value of ring->head should also be safe, and I added a
comment explaining why. Does that address your concerns? Cheers,

"Thought is the essence of where you are now."
---- SNIP: from git:// ----