Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

From: Ingo Molnar
Date: Tue Mar 20 2018 - 04:27:04 EST



* Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

> > Useful also for code that needs AVX-like registers to do things like CRCs.
>
> x86/crypto/ has a lot of AVX optimized code.

Yeah, that's true, but the crypto code is processing fundamentally bigger blocks
of data, which amortizes the cost of using kernel_fpu_begin()/_end().

kernel_fpu_begin()/_end() is a pretty heavy operation because it does a full FPU
save/restore via the XSAVE[S] and XRSTOR[S] instructions, which can easily copy a
thousand bytes around! So kernel_fpu_begin()/_end() is probably a non-starter for
something small, like a single 256-bit or 512-bit word access.

But there's actually a new thing in modern kernels: we got rid of (most of) lazy
save/restore FPU code, our new x86 FPU model is very "direct" with no FPU faults
taken normally.

So assuming the target driver will only load on modern FPUs I *think* it should
actually be possible to do something like (pseudocode):

vmovdqa %ymm0, 40(%rsp)
vmovdqa %ymm1, 80(%rsp)

...
# use ymm0 and ymm1
...

vmovdqa 80(%rsp), %ymm1
vmovdqa 40(%rsp), %ymm0

... without using the heavy XSAVE/XRSTOR instructions.

Note that preemption probably still needs to be disabled and possibly there are
other details as well, but there should be no 'heavy' FPU operations.

I think this should still preserve all user-space FPU state and shouldn't muck up
any 'weird' user-space FPU state (such as pending exceptions, legacy x87 running
code, NaN registers or weird FPU control word settings) we might have interrupted
either.

But I could be wrong, it should be checked whether this sequence is safe.
Worst-case we might have to save/restore the FPU control and tag words - but those
operations should still be much faster than a full XSAVE/XRSTOR pair.

So I do think we could do more in this area to improve driver performance, if the
code is correct and if there's actual benchmarks that are showing real benefits.

Thanks,

Ingo