Re: PATCH: Raw device IO for 2.1.131

Richard Gooch (rgooch@atnf.csiro.au)
Mon, 14 Dec 1998 22:05:46 +1100


Jes Sorensen writes:
> >>>>> "Richard" == Richard Gooch <rgooch@atnf.csiro.au> writes:
>
> Richard> pmonta@halibut.imedia.com writes:
> >> With a fast disk array it's easy to make this wastage dominate the
> >> CPU. You remark that this is an evil oddity; I'm not sure I agree.
> >> Fast I/O buses are nice. I'd *like* to be I/O bound, but it's not
> >> in the cards with vanilla Linux because the CPU is forced to be
> >> involved, to no benefit that I can see.
>
> Richard> But isn't this where sendfile(2) is used? The input file is
> Richard> your disc file and the output file is your network
> Richard> socket. The kernel triggers the DMA from disc to memory (say
> Richard> a skbuf) and then triggers the DMA from memory to the network
> Richard> interface. No cache pollution at all.
>
> Let me point out why I don't like sendfile() then. Most operating
> systems don't support sendfile() and as such it is a pain in the neck
> to have to write multiple versions of your software if you run on
> multiple operating systems. Ie. the software we use here for most of
> out data transfers is used on at least six different UNIX versions in
> house and I bet some of the other sites using this software might run
> it on other flavors.

I agree with that sentiment: I've got more #ifdef's in my code than
I'd like.

> SGI managed to get zero copy right for write() on a socket and it
> works great, I'd love having Linux do the same.

To get you what you want without resorting to sendfile(), we'd have to
be able to be able to pin down user pages and then initiate DMA. Linus
has said he doesn't like that idea and has also pointed out he feels
that the copy operation would not be a significant overhead. Other
have mentioned other applications (video capture and processing) where
they feel an extra copy *is* significant. But is this the case with
your application? It seems to me that if you have a bunch of user
pages you want to DMA out, you have already spent a considerable
amount of time generating the data, so an extra copy is not
significant. Is that so?

The reason I ask is that I see in this discussion a number of
different applications, at least some of which present the same
arguments for the need for zero copy. I wonder if some of those
applications don't actually need zero-copy in which case perhaps other
applications where it is established that they *do* need zero-copy may
not need page pinning tricks. If all the other applications can be
solved with page aliasing, that might be enough.

One thing that sendfile() would appear to make easier is PCI->PCI DMA
from disc to NIC. Doing it without sendfile() would require
mmap()+write() which may be much harder to support PCI-PCI DMA. The
mmap()+write() scheme requires a smart write() implementation, I
think, in order to distinguish pages in userspace and non-present
pages that need to be DMA'ed in. Perhaps sendfile() can make that
easier?

Regards,

Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/