Re: [RFC][PATCH] New iovec support & VFS changes

From: Badari Pulavarty
Date: Tue Dec 20 2005 - 13:07:12 EST


On Tue, 2005-12-20 at 20:00 +0200, Avi Kivity wrote:
> Badari Pulavarty wrote:
>
> >I was trying to add support for preadv()/pwritev() for threaded
> >databases. Currently the patch is in -mm tree.
> >
> >http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.15-
> >rc5/2.6.15-rc5-mm3/broken-out/support-for-preadv-pwritev.patch
> >
> >This needs a new set of system calls. Ulrich Drepper pointed out
> >that, instead of adding a system call for the limited functionality
> >it provides, why not we add new iovec interface as follows (offset-per-
> >segment) which provides greater functionality & flexibility.
> >
> >+struct niovec
> >+{
> >+ void __user *iov_base;
> >+ __kernel_size_t iov_len;
> >+ __kernel_loff_t iov_off; /* NEW */
> >+};
> >
> >In order to support this, we need to change all the file_operations
> >(readv/writev) and its helper functions to take this new structure.
> >
> >I took a stab at doing it and I want feedback on whether this is
> >acceptable. All the patch does - is to make kernel use new structure,
> >but the existing syscalls like readv()/writev() still deals with
> >original one to keep the compatibility. (pipes and sockets need
> >changing too - which I have not addressed yet).
> >
> >Is this the right approach ?
> >
> >
> >
> You can io_submit() a list of IO_CMD_PREAD[V]s and immediately
> io_getevents() them. In addition to specifying different file offsets
> you can mix reads and writes, mix file descriptors, and reap nonblocking
> events quickly (by specifying a timeout of zero).
>
> Sure, it's two syscalls instead of one, but it's much more flexibles,
> and databases should be using aio anyway. Oh, and no kernel changes
> needed, apart from merging vectored aio.


Yes. We discussed this also earlier. Using AIO is the alternative.
But using AIO is not simple as doing preadv()/pwritev() for the
applications doesn't care about using AIO. AIO needs extra coding
to setup context, iocb, submits and getevents etc..

And also, inside the kernel - AIO requests go through lots of
code/routines -- before coming to ->aio_read() -- which I was
planning to avoid by having a direct syscall to do preadv/pwritev.

BTW, we still don't have vectored AIO support in the kernel.
Zack is working on it - which would add another set of
file operations aio_readv/aio_writev.

Thanks,
Badari

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/