Re: [PATCH v3 2/2] x86/pkeys: Set XSTATE_BV[PKRU] to 1 so that PKRU is XRSTOR'd correctly

From: Aruna Ramakrishna
Date: Mon Dec 02 2024 - 13:34:19 EST




> On Nov 22, 2024, at 4:10 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 11/19/24 09:45, Aruna Ramakrishna wrote:
>> PKRU value is not XRSTOR'd from the XSAVE area if the corresponding
>> XSTATE_BV[i] bit is 0. A wrpkru(0) sets XSTATE_BV[PKRU] to 0 on AMD
>> systems, which means the PKRU value updated on the sigframe later on,
>> in update_pkru_in_sigframe(), is ignored.
>>
>> To make this behavior consistent across Intel and AMD systems, and to
>> ensure that the PKRU value updated on the sigframe is always restored
>> correctly, explicitly set XSTATE_BV[PKRU] to 1.
>>
>> Fixes: 70044df250d0 ("x86/pkeys: Update PKRU to enable all pkeys before XSAVE")
>>
>> Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@xxxxxxxxxx>
>> Suggested-by: Rudi Horn <rudi.horn@xxxxxxxxxx>
>
> I still think this changelog needs quite a bit of work for someone to
> make sense of this if they read it in a year. Perhaps:
>
> --
>
> When XSTATE_BV[i] is 0, and XRSTOR attempts to restore state component
> 'i' it ignores any value in the XSAVE buffer and instead restores the
> state component's init value.
>
> This means that if XSAVE writes XSTATE_BV[PKRU]=0 then XRSTOR will
> ignore the value that update_pkru_in_sigframe() writes to the XSAVE buffer.
>
> XSTATE_BV[PKRU] only gets written as 0 if PKRU is in its init state. On
> Intel CPUs, basically never happens because the kernel usually
> overwrites the init value (aside: this is why we didn't notice this bug
> until now). But on AMD, the init tracker is more aggressive and will
> track PKRU as being in its init state upon any wrpkru(0x0).
> Unfortunately, sig_prepare_pkru() does just that: wrpkru(0x0).
>
> To fix this, always overwrite the sigframe XSTATE_BV with a value that
> has XSTATE_BV[PKRU]==1. This ensures that XRSTOR will not ignore what
> update_pkru_in_sigframe() wrote.
>
> The problematic sequence of events is something like this:
>
> Userspace does:
> * wrpkru(0xffff0000) (or whatever)
> * Hardware sets: XINUSE[PKRU]=1
> Signal happens, kernel is entered:
> * sig_prepare_pkru() => wrpkru(0x00000000)
> * Hardware sets: XINUSE[PKRU]=0 (aggressive AMD init tracker)
> * XSAVE writes most of XSAVE buffer, including
> XSTATE_BV[PKRU]=XINUSE[PKRU]=0
> * update_pkru_in_sigframe() overwrite PKRU in XSAVE buffer
> ... signal handling
> * XRSTOR sees XSTATE_BV[PKRU]==0, ignores just-written value
> from update_pkru_in_sigframe()
>
> But otherwise, I think the code is fine:
>
> Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>
> I can fix up the changelog at application time if everyone is OK with it.

Thank you Dave. I agree, this reads better.

I’m a little unclear if I should send out a v4 with the updated changelog.

Thanks,
Aruna