Actually, that's _only_ due to the memcpy() fixes. The buffer cache
fixes were a non-performance issue, and "only" fixed some behavioural
things when unlinking or truncating a file that was the backing-store
for a memory mapping.
I'm not too surprised about the 30% figure, actually: for some silly
reason the alpha kernel used to use the generic memcpy (that does a byte
at a time copies) instead of the more optimized memcpy, despite the fact
that I had actually _written_ the optimized memcpy a long time ago.
Now, the optimized memcpy() is roughly ten times faster than the
byte-at-a-time one, and _on_top_of_that_ is also much better for cache
performance. And it's used a _lot_, notably for page copying at a C-O-W
fault.
Some profiling on the x86 side shows that the kernel spends something
like 20% of its time just copying and clearing memory, so it really does
make a huge difference.
Linus