Re: Speed of memcpy, csum_partial and csum_partial_copy

Robert L Krawitz (rlk@tiac.net)
Sun, 9 Jun 1996 11:38:02 -0400


From: alan@lxorguk.ukuu.org.uk (Alan Cox)
Date: Sat, 8 Jun 1996 20:26:51 +0100 (BST)

The SMP fpu code is subtley different as it saves the FPU state on
all context switches and "knows" the FPU won't be used early in the
system startup before the processes are set up right. I guess that
is what causes his locks. The other item is the kernel process
cannot sleep doing an FPU copy as it might wake up on the _other_
processor.

Perhaps the FPU memcpy should have a lockout that permits it to be
disabled if/when it's not safe.

Have you looked at using the integer unit to asynchronously touch
cache lines ahead of the FPU btw ?

On the read side, this wouldn't accomplish much (that I can figure
out). On the write side, it clobbers performance (just as I would
expect, and have measured).

Now we need FPU copy/checksum 8). On a more serious note the next
generation Intel CPU's with the 57 new "multimedia" instructions
like dot product will let us down even better block copies and also
copy/csum's looking at the intel blurb.

Completely agreed.

-- 
Robert Krawitz <rlk@tiac.net>           http://www.tiac.net/users/rlk/

Member of the League for Programming Freedom -- mail lpf@uunet.uu.net Tall Clubs International -- tci-request@aptinc.com or 1-800-521-2512