Re: [patch V2 00/14] x86/fpu: Mop up XSAVES and related damage

From: Dave Hansen
Date: Tue Jun 08 2021 - 10:47:55 EST


On 6/7/21 3:51 PM, Thomas Gleixner wrote:
...
> But it creates a few new problems:
>
> 1) Where to put the PKRU value in the sigframe?
>
> For 64bit sigframes that's easy as there is padding space, for
> 32bit sigframes that's a problem because there is no space.
>
> 2) Backward compatibility
>
> As much as we wish to have a time machine there is a rule not to
> break existing user space.
>
> Now fortunately there is a way out:
>
> 1) User space cannot rely on PKRU being XSTATE managed unless PKRU is
> enabled in XCR0. XCR0 enablement is part of the UABI so any
> complaint about missing XCR0 support is futile

So... One more gem from the manpages:

> It is recommended that
> applications wanting to use protection keys should simply call
> pkey_alloc(2) and test whether the call succeeds, instead of
> attempting to detect support for the feature in any other way.

I kinda wrote that thinking that folks could avoid doing the
CPUID/XGETBV dance and just use the syscall instead. *If* they do what
is suggested, they'll never notice the lack of PKRU in XCR0.

The pkey selftest, for instance, blindly assumes that pkeys is enabled
in XCR0. It would probably end up scribbling somewhere on the stack.
Now the same person who wrote that also wrote the manpages, so those are
not exactly two separate data points.

...
> So the proposed solution is to:
>
> A) Exclude PKRU from XSTATE managed state, i.e. do not set the PKRU
> bit in XCR0
>
> B) Exclude 32bit applications on 64bit kernels from using PKEYS by
> returning an error code from pkey_alloc(). That's fine because the
> man page requires them to handle the fail which they need to do
> anyway because 32bit kernel do not support PKEYS and never will.
>
> C) Replace the current context switch mechanism which is partially
> XSAVE based by a software managed one.
>
> D) Store the PKRU value in one of the reserved slots of the 64bit
> signal frame which is possible because of #B so that a signal
> handler has the chance to override the interrupted task's PKRU
> setting.
>
> Thoughts?

The thing that makes me most nervous is changing the signal stack ABI
for PKRU. Careful apps (not the selftest) will probably have more
careful enumeration and might bug out due to the missing XCR0 bit. Or,
they might at least check xfeatures (aka. XSTATE_BV) in the signal stack
XSAVE buffer.

On the bright side, rudely masking PKRU out of XCR0:

xcr0 &= ~XFEATURE_MASK_PKRU;

still results in a kernel that boots.