Re: [patch 00/41] x86/fpu: Spring cleaning and PKRU sanitizing

From: Thomas Gleixner
Date: Fri Jun 11 2021 - 20:24:22 EST


On Fri, Jun 11 2021 at 18:15, Thomas Gleixner wrote:
> - Removal of PKRU from being XSTATE managed in the kernel because PKRU
> has to be eagerly restored on context switch and keeping it in sync
> in the xstate buffer is just pointless overhead and fragile.

Just before anyone comes up with any complaints about the resulting
inconsistency vs. xgetbv(1) in the case that the PKRU value is 0.

That inconsistency is simply a INTEL only hardware bug and there is no
way to get this consistent ever no matter what kind of mechanism the
kernel uses. This inconsistency can be demonstrated in user space w/o
any kernel interaction.

The Intel SDM states in volume 1, chapter 13.6

PROCESSOR TRACKING OF XSAVE-MANAGED STATE

* PKRU state. PKRU state is in its initial configuration if the value
of the PKRU is 0.

But that's just not true.

wrpkru(0)
assert(!(xgetbv(1) & XFEATURE_PKRU);

fails on Intel but not on AMD AFAIK. xgetbv(1) returns the 'INUSE'
bitmap of xstate managed features.

But the Intel SDM is blury about this:

XINUSE denotes the state-component bitmap corresponding to the init
optimization. If XINUSE[i] = 0, state component i is known to be in
its initial configuration; otherwise XINUSE[i] = 1. It is possible for
XINUSE[i] to be 1 even when state component i is in its initial
configuration. On a processor that does not support the init
optimization, XINUSE[i] is always 1 for every value of i.

IOW there is no consistency vs. XINUSE and initial state guaranteed at
all. So why should the kernel worry about this?

We just use the most optimized way to deal with this and that's what
this patch series is doing by removing PKRU from xstate management in
the kernel.

If anyone cares about consistency of XINUSE vs. the actual component
state then please redirect the complaints to INTEL.

Either the hardware folks get their act together or software which
relies on consistency (cough, cough) like rr has to cope with it.

Making the kernel to pretend that all of this is consistent under all
circumstances is a futile attempt to ignore reality.

This inconsistency can only be fixed in hardware/ucode. End of story.

Thanks,

tglx