Re: Lazy FPU restoration / moving kernel_fpu_end() to context switch

From: Thomas Gleixner
Date: Fri Jun 15 2018 - 12:25:50 EST


On Fri, 15 Jun 2018, Jason A. Donenfeld wrote:
> In a loop this looks like:
>
> for (thing) {
> kernel_fpu_begin();
> encrypt(thing);
> kernel_fpu_end();
> }
>
> This is obviously very bad, because begin() and end() are slow, so
> WireGuard does the obvious:
>
> kernel_fpu_begin();
> for (thing)
> encrypt(thing);
> kernel_fpu_end();
>
> This is fine and well, and the crypto API I'm working on will enable

It might be fine crypto performance wise, but it's a total nightmare
latency wise because kernel_fpu_begin() disables preemption. We've seen
latencies in the larger millisecond range due to processing large data sets
with kernel FPU.

If you want to go there then we really need a better approach which allows
kernel FPU usage in preemptible context and in case of preemption a way to
stash the preempted FPU context and restore it when the task gets scheduled
in again. Just using the existing FPU stuff and moving the loops inside the
begin/end section and keeping preemption disabled for arbitrary time spans
is not going to fly.

Thanks,

tglx