I'd always assumed that Apache already used mmap/write but, having
started reading this thread and looking at the source of 1.1.3, I see
that it doesn't. Currently, everything gets read into userspace 8K at
a time and then blasted straight out again. Yuck. You're clearly
aware that going straight from the page cache via mmap/write is a big
win but you claim that you have to pay for the mapping so much that
you'd need to cache it. Have you verified/benchmarked that or is it
just a gut feeling (or perhaps tested on other Unices)? I haven't
tested it myself but here are a couple of Linux-specific things to
bear in mind:
(1) Linux holds vma information in AVL trees instead of in lists with
kludged extra bits stuck on for a performance hack (i.e. SVR4).
Although it adds slight overhead to inserting a mapping, it may make
the behaviour differ significantly (better, with any luck) from other
Unices in the case where there is a lot of mmap activity and a lot of
active mappings in the same process.
(2) Linux syscalls are usually faster and cheaper than other Unices,
as are things like page table manipulation. With mmap v. read you're
trading memory bandwidth for page table manipulation and you may find
that the balance point moves in favour of mmap in the case of Linux.
No hard figures here, I'm afraid, but it might be worth investigation.
--Malcolm
-- Malcolm Beattie <mbeattie@sable.ox.ac.uk> Unix Systems Programmer Oxford University Computing Services