Re: [patch 00/41] x86/fpu: Spring cleaning and PKRU sanitizing

From: Dave Hansen
Date: Wed Jun 16 2021 - 16:55:22 EST


On 6/11/21 5:24 PM, Thomas Gleixner wrote:
> The Intel SDM states in volume 1, chapter 13.6
>
> PROCESSOR TRACKING OF XSAVE-MANAGED STATE
>
> * PKRU state. PKRU state is in its initial configuration if the value
> of the PKRU is 0.
>
> But that's just not true.
>
> wrpkru(0)
> assert(!(xgetbv(1) & XFEATURE_PKRU);

Hi Thomas,

It's pretty clear that Intel's implementation was intentional. It was
certainly no accident that it was implemented this way.

I'm a bit confused why you expected to see XINUSE[PKRU]=0 up in your
example. The CPU is *free* to set XINUSE[PKRU]=0, but it appears that
the example expects that it *must* set XINUSE[PKRU]=0.

I do wish the SDM had been excruciatingly explicit in defining:

initial configuration
vs.
initial state

All features in their "initial state" have their "initial configuration"
values, but not all features with their "initial configuration" values
are in their "initial state".

> But the Intel SDM is blury about this:
>
> XINUSE denotes the state-component bitmap corresponding to the init
> optimization. If XINUSE[i] = 0, state component i is known to be in
> its initial configuration; otherwise XINUSE[i] = 1. It is possible for
> XINUSE[i] to be 1 even when state component i is in its initial
> configuration. On a processor that does not support the init
> optimization, XINUSE[i] is always 1 for every value of i.
>
> IOW there is no consistency vs. XINUSE and initial state guaranteed at
> all. So why should the kernel worry about this?

Exactly, there is really no consistency guarantee. This was written to
give the CPU designers some flexibility so that they could opt to omit
"init tracker" hardware if they chose. Or, so that they could be a bit
lazy about implementing one.

Imagine what would happen if the AMD PKRU init tracking behavior (write
all 0's, get XINUSE[PKRU]=0) was *required* XSAVE behavior. Every ZMM
register write would potentially need to go checking ~2k of state to see
if the rest of the state is all 0's.

> If anyone cares about consistency of XINUSE vs. the actual component
> state then please redirect the complaints to INTEL.

I think we can take a _bit_ of the blame on the kernel side too. The
kernel has very good reasons for managing PKRU with WRPKRU instead of
XSAVE. *But*, it also tossed out XINUSE[PKRU] consistency in the process.

I'm not sure we should be looking to the hardware to bring that back.

> Either the hardware folks get their act together or software which
> relies on consistency (cough, cough) like rr has to cope with it.
>
> Making the kernel to pretend that all of this is consistent under all
> circumstances is a futile attempt to ignore reality.
>
> This inconsistency can only be fixed in hardware/ucode. End of story.

I agree with this. If it's going to be fixed, the kernel simply doesn't
have the tools to do it. We either need new ISA or new hardware/ucode.

I'm just not convinced it's worth fixing for PKRU.