Re: Interesting pentium-memcpy results

Albert D. Cahalan (acahalan@cs.uml.edu)
Tue, 29 Jul 1997 04:04:30 -0400 (EDT)


Chris Evans (chris@ferret.lmh.ox.ac.uk) writes:

> I just compared 2.1.46 vs. 2.1.46+pentium memcpy patch,
> and interestingly enough found that the UNIX byte benchmarks
> tended to _drop_ a fair bit, with the exception of process
> creation and execl throughput. (Note that I only ran the
> basic 'system' tests - TCP bandwidth etc. to be determined
> when I find the newer benchmarks)
>
> This is most interesting since I used to swear by the patch.
>
> It does however show us that there still is performance to
> be gained. I presume the process creation test will be using
> fork() which does a lot of memcpy'ing of various process
> credentials in kernel space.

I think it shows that the memcpy size test is significant.
Perhaps the FPU is best used only when explicitly requested
for large operations. That would mean page clearing I guess.

big_aligned_memcpy() and big_aligned_clear() perhaps?
For 512 bytes and up, optimized for each arch.

There may be a conflict with the user-space version.
With both the kernel and apps abusing the FPU for memcpy,
the FPU must be restored too often.