Re: [RFC PATCH 1/2] x86/fpu: make kernel-mode FPU reliably usable in softirqs

From: Dave Hansen
Date: Wed Feb 26 2025 - 12:09:39 EST


On 2/25/25 14:59, Eric Biggers wrote:
> If we had to save/restore a large number of vector registers in every crypto
> function call (not amortized to one save/restore per return to userspace), that
> would be a big performance problem.

I just did a quick trace on my laptop. Looks like I have two main
kernel_fpu_begin() users: LUKS and networking. They both very much seem
to do a bunch of kernel_fpu_begin() operations but very few actual XSAVEs:

26 : save_fpregs_to_fpstate <-kernel_fpu_begin_mask
818 : kernel_fpu_begin_mask <-crc32c_pcl_intel_update
4192 : kernel_fpu_begin_mask <-xts_encrypt_vaes_avx10_256

This is at least _one_ data point very much in favor of Eric's argument
here. It appears that that the cost of one XSAVE is amortized across a
bunch of kernel_fpu_begin()s.