Re: Possible improvement to pipe throughput

Linus Torvalds (torvalds@cs.helsinki.fi)
Fri, 27 Sep 1996 10:19:52 +0300 (EET DST)


On Fri, 27 Sep 1996, Joerg Pommnitz wrote:
>
> Linus had done such an implementation. If I remeber correctly he thought
> it was only good for high benchmark numbers without real world significance.

Actually, if done well it would indeed be beneficial to use some memory
mapping primitives to handle pipes. However, the best implementation is
rather complex, and there are some non-obvious pitfalls when thinking about
page table modifications and read()/write().

The implementation I did was very aggressive in using page mapping to get
good performance, and it essentially gave "unlimited" bandwidth (ie no copies
at all if the source and destination was properly aligned and neither the
reader nor the writer actually changed the buffer). However, for normal
things it didn't seem to make much of a difference.

Now, the _right_ solution is not to be all that aggressive, because the
optimal cases never happen in real life. I think the best performance can
be gotten by something like this:

- pipe_write():
look up the memory area the user is writing from (this is "free",
since we have to do it anyway for "verify_area()"). If it's a shared
memory object or is a file mapping, just copy it the old way, because
otherwise we can't guarantee that the data in the mapping doesn't
change.

If it's a private page, look up the physical page. If it's swapped
out, again do a normal copy (that will swap it in), because that case
isn't performance-critical anyway so we do the "safe" thing.

Finally, if it's a private page and exists in memory, just remember
the kernel address of the page and sleep.

- pipe_read():
copy to the reading process either from the kernel buffer (that we
copied from the user from) or from the original page that we looked
up.

Now, the above means that we always copy at least _once_ (in the pipe
reader), and if in doubt we copy twice (the same way we do now). But the
normal case now should be that we copy just once, so we have essentially
doubled pipe throughput performance.

Note that this doesn't actually _change_ any page tables or anything like
that. Changing page tables is approaching being so expensive that it's
questionable whether it really helps in real life, especially on SMP etc. So
we do use the memory mapping, but only to look things up on the writer side.

(why writer, not reader? Partly because the writer doesn't have some
problems that the reader has (dirty bits on page tables when the page is
being modified by the read()).

Linus