Re: AES assembler optimizations

From: David S. Miller
Date: Tue Aug 10 2004 - 15:40:37 EST


On Tue, 10 Aug 2004 19:51:29 +0000 (UTC)
hpa@xxxxxxxxx (H. Peter Anvin) wrote:

> It's not really that hard, you just have to have enough work to
> amortize it over. The two metrics are: how much work do you get per
> call, and how much work do you get before the next schedule().

Someone might want to investigate using sparc64's FPU saving
scheme on x86, if possible. It might make the cut-off point
smaller.

On sparc64, we:

1) Always save the full FPU state at context switch time if it
is active.

2) On entry to a FPU-using kernel routine, we save the FPU if
it is active.

3) On exit from a FPU-using kernel routine, we do nothing
except mark the FPU as inactive.

4) FPU-disabled traps by the user restore the state saved
by #1 or #2

Not that this means FPU state can be recursively saved.
For example, if a FPU memcpy take an interrupt, and the interrupt
handler invokes a FPU memcpy, it works just fine.

This works extremely well for cases such as:

The user made the FPU active, but it is not going to
use the FPU for quite some time. The kernel can use
the FPU multiple times, and only need to save state once.

It's worked extremely well in practice. We store the stack
of FPU states at the end of the thread_struct area. This
provides better cache behavior than storing it on the local
kernel stack each time the kernel wants to use the FPU (Solaris
on UltraSPARC chooses this method BTW).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/