Re: next-20130117 - kernel BUG with aio

From: Kent Overstreet
Date: Thu Jan 24 2013 - 16:20:03 EST


On Thu, Jan 24, 2013 at 12:22:21PM -0500, Valdis.Kletnieks@xxxxxx wrote:
> On Wed, 23 Jan 2013 20:10:03 +0800, Hillf Danton said:
>
> > Try again?
> > ---
> >
> > --- a/fs/aio.c Tue Jan 22 21:37:54 2013
> > +++ b/fs/aio.c Wed Jan 23 20:06:14 2013
>
> Now seeing this:
>
> [ 2941.495370] ------------[ cut here ]------------
> [ 2941.495379] WARNING: at fs/aio.c:336 put_ioctx+0x1cb/0x252()
>
> Same warn location, but different traceback?

Finally reproduced it (thanks to Zach Brown for the idea - using a
loopback device) - it's triggered when there's outstanding kiocbs when
we're trying to tear down the kioctx.

Found a couple bugs once I was able to reproduce it. Besides the null
pointer deref that Hillf posted a patch for, cancellation was fubar -
kiocb_cancel() shouldn't be marking the kiocb as cancelled if it didn't
have a cancel callback.

The other two bugs I found were both a result of the fact that
aio_run_iocb() touches the kiocb after passing it off to a method that
may call aio_complete() on it - which is something I originally missed.

Digression: When I was refactoring, I was hoping to be able to change
the kiocb refcounting so that the refcount's just initialized to one,
and the initial refcount is dropped when aio_complete() is called - and
since nothing outside of the aio code messes with the kiocb refcount, it
might be possible to get rid of the kiocb refcount entirely if I can
figure out how to deal with cancellation.

Anyways - that didn't work out, or at least it's going to take more
work. The two bugs that resulted from that were:

- The "aio: kill ki_retry" patch dropped the second kiocb ref that
io_submit_one drops when it returns. But aio_rw_vect_retry() touches
the kiocb after passing it off to f_op->aio_(read|write) which will
call aio_complete(), so this is a use after free.

- The "aio: smoosh struct kiocb" patch assumed that some of the fields
in struct kiocb aren't needed after aio_complete() has been called.
This is _almost_ true, but again aio_rw_vect_retry() looks at those
fields which ends up determining whether aio_run_iocb() calls
aio_complete() itself.

This seems _ridiculously_ sketchy to me, or at least convoluted...
but it'll take awhile to figure out how to clean that up and I'm not
in a great hurry to do so.


So, Andrew - that "smoosh struct kiocb" patch should just be dropped,
even if I fixed that issue clearly the idea is a lot less safe than I
thought. I've got patches for the other stuff I'm going to mail out
momentarily.

Ben, Zach - the cancellation stuff (both the fix and the
other changes) need more review, and we need a saner way of testing
cancellation. Maybe it'd be worth implementing cancellation for regular
block devices just so that we have a way of testing it :/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/