Re: [RFC][PATCH] New iovec support & VFS changes

From: Avi Kivity
Date: Tue Dec 20 2005 - 13:19:35 EST


Badari Pulavarty wrote:

On Tue, 2005-12-20 at 20:00 +0200, Avi Kivity wrote:


You can io_submit() a list of IO_CMD_PREAD[V]s and immediately io_getevents() them. In addition to specifying different file offsets you can mix reads and writes, mix file descriptors, and reap nonblocking events quickly (by specifying a timeout of zero).

Sure, it's two syscalls instead of one, but it's much more flexibles, and databases should be using aio anyway. Oh, and no kernel changes needed, apart from merging vectored aio.




Yes. We discussed this also earlier. Using AIO is the alternative.
But using AIO is not simple as doing preadv()/pwritev() for the
applications doesn't care about using AIO. AIO needs extra coding
to setup context, iocb, submits and getevents etc..


Possibly a library can do that (placing the context in thread local storage), but I see your point.

And also, inside the kernel - AIO requests go through lots of
code/routines -- before coming to ->aio_read() -- which I was
planning to avoid by having a direct syscall to do preadv/pwritev.


I'd be surprised if this isn't dominated by the cost of serving the request, even from cache.

If we can persuade the aio maintainers to add some flag to io_submit() which makes the request synchronous, it would reduce the overhead somewhat, but it is against the spirit of aio.

BTW, we still don't have vectored AIO support in the kernel.
Zack is working on it - which would add another set of
file operations aio_readv/aio_writev.


Hopefully they will supercede aio_read/aio_write?

BTW, don't databases like using many many files for their backing store? The ability to write to many fds in one call should be attractive to them.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/