[RFC] Restore PKRU to user-defined value after signal handling

From: Aruna Ramakrishna
Date: Wed Nov 06 2024 - 13:37:34 EST


Hello,

The following commit, which is part of 6.12 rc, does not work consistently on systems with AMD processors vs. Intel:

70044df250d0: x86/pkeys: Update PKRU to enable all pkeys before XSAVE

This zeroes out the pkeys in handle_signal() by calling sig_prepare_pkru():

/*
* Enable all pkeys temporarily, so as to ensure that both the current
* execution stack as well as the alternate signal stack are writeable.
* The application can use any of the available pkeys to protect the
* alternate signal stack, and we don't know which one it is, so enable
* all. The PKRU register will be reset to init_pkru later in the flow,
* in fpu__clear_user_states(), and it is the application's responsibility
* to enable the appropriate pkey as the first step in the signal handler
* so that the handler does not segfault.
*/
static inline u32 sig_prepare_pkru(void)
{
u32 orig_pkru = read_pkru();

write_pkru(0);
return orig_pkru;
}

The write_pkru(0) call seems to set xinuse[9] to 0 on systems with AMD CPUs (but not Intel), which means the user-defined PKRU value overwritten in the sigframe (in update_pkru_in_sigframe()) is not restored by XRSTOR and the PKRU value stays at 0 when it returns back to userspace. Which is unexpected.

AMD:

$ ./handler-pkru
startup pkru = 0x55555554
changed in main thread pkru = 0xfffffff0
received signal 10
in signal handler pkru = 0x55555554
after usr1 signal pkru = 0x00000000



xcr0 207
xcr0 AND xinuse 202
writing pkru 0
xcr0 207
xcr0 AND xinuse 2


Intel:

$ ./handler-pkru
startup pkru = 0x55555554
changed in main thread pkru = 0xfffffff0
received signal 10
in signal handler pkru = 0x55555554
after usr1 signal pkru = 0xfffffff0



xcr0 2E7
xcr0 AND xinuse 2A2
writing pkru 0
xcr0 2E7
xcr0 AND xinuse 2A2

From the Intel manual:


13.6 PROCESSOR TRACKING OF XSAVE-MANAGED STATE

The following notation describes the state of the init and modified optimizations:
• XINUSE denotes the state-component bitmap corresponding to the init optimization. If XINUSE[i] = 0, state component i is known to be in its initial configuration; otherwise XINUSE[i] = 1. It is possible for XINUSE[i] to be 1 even when state component i is in its initial configuration. On a processor that does not support the init optimization, XINUSE[i] is always 1 for every value of i.
...

• PKRU state. PKRU state is in its initial configuration if the value of the PKRU is 0.
...

13.8.1 Standard Form of XRSTOR

XRSTOR updates state component i based on the value of bit i in the XSTATE_BV field of the XSAVE header:
• If XSTATE_BV[i] = 0, the state component is set to its initial configuration. Section 13.6 specifies the initial configuration of each state component.
The initial configuration of state component 1 pertains only to the XMM registers and not to MXCSR. See below for the treatment of MXCSR
• If XSTATE_BV[i] = 1, the state component is loaded with data from the XSAVE area. See Section 13.5 for specifics for each state component and for details regarding mode-specific operation and operation determined by instruction prefixes. See Section 13.13 for details regarding faults caused by memory accesses.


The line “PKRU state is in its initial configuration if the value of the PKRU is 0” seems to imply that when the PKRU register is set to 0, xinuse[9] is also automatically set to 0 and that is expected behavior, which causes XRSTOR to not load the register value from XSAVE area. But we do not want xinuse[9] to be set to 0 here, as we want the PKRU value to be correctly restored from the sigframe - otherwise it becomes a security issue.

I’m not really sure of the correct way to reset xinuse[9] to 1 (after wrpkru(0)) - but something like this seems to work (thanks to Rudi Horn for both finding the issue and suggesting this patch):

diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 1065ab995305..701a163f0ac5 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -68,9 +68,35 @@ static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
*/
static inline int update_pkru_in_sigframe(struct xregs_state __user *buf, u32 pkru)
{
+ int err = 0;
+
if (unlikely(!cpu_feature_enabled(X86_FEATURE_OSPKE)))
return 0;
- return __put_user(pkru, (unsigned int __user *)get_xsave_addr_user(buf, XFEATURE_PKRU));
+
+ if (pkru != 0) {
+ err = __put_user(pkru,
+ (unsigned int __user *)get_xsave_addr_user(
+ buf, XFEATURE_PKRU));
+ u64 xfeatures;
+ u64 __user *xfeaturesp = &buf->header.xfeatures;
+
+ err |= __get_user(xfeatures, xfeaturesp);
+
+ /*
+ * On some systems, when PKRU is set to 0, the corresponding
+ * XINUSE bit is also zeroed out, which causes XRSTOR to not
+ * load the register value from XSAVE area. Which means the
+ * PKRU value that was updated on the sigframe will be
+ * effectively discarded.
+ *
+ * Mark PKRU as in use so that it is restored correctly.
+ */
+ if (!err & !(xfeatures & XFEATURE_MASK_PKRU)) {
+ xfeatures |= XFEATURE_MASK_PKRU;
+ err |= __put_user(xfeatures, xfeaturesp);
+ }
+ }
}

I’ve tested a version of this patch on both AMD and Intel systems and it works.

Please let me know if this is acceptable, or if there’s a better way to do this.

Thanks,
Aruna