Re: arch_set_user_pkey_access only works on the current task_struct
From: Dave Hansen
Date: Tue Jun 08 2021 - 10:55:37 EST
On 6/7/21 8:16 PM, liangjs wrote:
> On Mon, 2021-06-07 at 10:52 -0700, Dave Hansen wrote:
>> On 6/5/21 6:10 AM, Jiashuo Liang wrote:
>>> I am learning the kernel implementation of the x86 PKU feature. I find the
>>> arch_set_user_pkey_access function in arch/x86/kernel/fpu/xstate.c does not
>>> use its first parameter. So it is perhaps a bug?
>> I wouldn't really call it a bug. But, yes, it is something we should
>> clean up.
> Should we remove the tsk parameter, or allow it to change the PKRU of tsk?
Probably just remove the parameter.
By the way, there's a big PKRU rework in progress. It might be best to
wait until the dust settles to poke at this.
> By the way, we are calling write_pkru, which changes both the CPU's PKRU
> and the xsave one. Why is this necessary?
PKRU affects kernel accesses to user memory. That means that you can't
run the *kernel* with an out-of-date PKRU, thus the write_pkru().
Returning to userspace blindly restores the *WHOLE* XSAVE buffer to the
regsisters. If you don't update the XSAVE buffer, the write_pkru() will
be overwritten before returning to userspace.
> If I want to change PKRU of a task_struct other than current, do I still
> need to call __write_pkru?
No. You can't do that. Seriously.
The protection keys architecture really doesn't support off-thread
manipulation of PKRU. Imagine you want to mask a bit out of PKRU, you
do the following to make key 2 memory accessible and writable:
reg = read_pkru();
reg &= 0x30;
write_pkru(reg);
Now, imagine that you tried to interrupt this poor task in the middle of
that operation. Let's say you try to *set* the bits for key 4, effectively:
pkru |= 0x300;
Now you try to do that key-4 business with an IPI.
reg = read_pkru(); // PKRU=0x30
reg &= 0x30;
-> IPI
ipireg = read_pkru(); // PKRU=0x0
ipireg |= 0x300;
write_pkru(ipireg); // PKRU=0x300
write_pkru(reg); // PKRU=0x0
You *LOST* the update from the IPI.