I have to admit, I can't see where. (That is, I'm aware
there are lots of copies there now, which one don't you
think one can get rid of?).
> So the cold cache numbers can
> never be faster than 1/4 of memory speed: DMA in, copy, DMA out.
> The best numbers are 1/2 memory speed: DMA in, DMA out.
Realistically, PC hardware can't do TCP checksumming, so
the best you can do is 1/3 memory speed. DMA in, checksum,
DMA out. Or can you copy-and-checksum directly to the
buffers of the Ethernet card?
> 2) On SGI's, for server type of operations, the mmap() is the bottleneck.
> You are setting up and tearing down a virtual mapping that you don't
> need: the ``currency'' you are dealing in at both ends is physical
> pages, not virtual pages. This starts to become a bottleneck for
> files smaller than 8K (Linux) or 32K (most other operating systems).
> Linux is better because it is lighter.
I must admit I don't understand why this is so. Surely the mmap
just sets up some kernel structures, it doesn't actually create
any virtual-physical mapping. Doesn't that happen when we fault
and the memory is read in. So isn't the overhead per page, and not
per mapping? (obviously not, but why?).
I wonder if we can use this to do lazy mmapping. For a server
process that doesn't look at its data we can just get tricky
in copy_from_user and copy from the physical address, without
ever having to map anything into virtual memory. Some locking
required...
-- Erik Corry- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu