Re: O_DIRECT question

From: Viktor
Date: Fri Jan 12 2007 - 11:59:42 EST


Linus Torvalds wrote:
>>OK, madvise() used with mmap'ed file allows to have reads from a file
>>with zero-copy between kernel/user buffers and don't pollute cache
>>memory unnecessarily. But how about writes? How is to do zero-copy
>>writes to a file and don't pollute cache memory without using O_DIRECT?
>>Do I miss the appropriate interface?
>
>
> mmap()+msync() can do that too.

Sorry, I wasn't sufficiently clear. Mmap()+msync() can't be used for
that if data to be written come from some external source, like video
capturing hardware, which DMA'ing data directly into the user space
buffers. Using mmap'ed area for those DMA buffers doesn't look as a good
idea, because, e.g., it will involve unneeded disk reads on the first
page faults.

So, some O_DIRECT-like interface should exist in the system. Also, as
Michael Tokarev noted, operations over mmap'ed areas don't provide good
ways for error handling, which effectively makes them unusable for
something serious.

> Also, regular user-space page-aligned data could easily just be moved into
> the page cache. We actually have a lot of the infrastructure for it. See
> the "splice()" system call. It's just not very widely used, and the
> "drop-behind" behaviour (to then release the data) isn't there. And I bet
> that there's lots of work needed to make it work well in practice, but
> from a conceptual standpoint the O_DIRECT method really is just about the
> *worst* way to do things.

splice() needs 2 file descriptors, but looking at it I've found
vmsplice() syscall, which, seems, can do the needed actions, although
I'm not sure it can work with files and zero-copy. Thanks for pointing
on those interfaces.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/