Re: arch_set_user_pkey_access only works on the current task_struct

From: Dave Hansen
Date: Tue Jun 08 2021 - 10:55:37 EST


On 6/7/21 8:16 PM, liangjs wrote:
> On Mon, 2021-06-07 at 10:52 -0700, Dave Hansen wrote:
>> On 6/5/21 6:10 AM, Jiashuo Liang wrote:
>>> I am learning the kernel implementation of the x86 PKU feature. I find the
>>> arch_set_user_pkey_access function in arch/x86/kernel/fpu/xstate.c does not
>>> use its first parameter. So it is perhaps a bug?
>> I wouldn't really call it a bug.  But, yes, it is something we should
>> clean up.
> Should we remove the tsk parameter, or allow it to change the PKRU of tsk?

Probably just remove the parameter.

By the way, there's a big PKRU rework in progress. It might be best to
wait until the dust settles to poke at this.

> By the way, we are calling write_pkru, which changes both the CPU's PKRU
> and the xsave one. Why is this necessary?

PKRU affects kernel accesses to user memory. That means that you can't
run the *kernel* with an out-of-date PKRU, thus the write_pkru().

Returning to userspace blindly restores the *WHOLE* XSAVE buffer to the
regsisters. If you don't update the XSAVE buffer, the write_pkru() will
be overwritten before returning to userspace.

> If I want to change PKRU of a task_struct other than current, do I still
> need to call __write_pkru?

No. You can't do that. Seriously.

The protection keys architecture really doesn't support off-thread
manipulation of PKRU. Imagine you want to mask a bit out of PKRU, you
do the following to make key 2 memory accessible and writable:

reg = read_pkru();
reg &= 0x30;
write_pkru(reg);

Now, imagine that you tried to interrupt this poor task in the middle of
that operation. Let's say you try to *set* the bits for key 4, effectively:

pkru |= 0x300;

Now you try to do that key-4 business with an IPI.

reg = read_pkru(); // PKRU=0x30
reg &= 0x30;
-> IPI
ipireg = read_pkru(); // PKRU=0x0
ipireg |= 0x300;
write_pkru(ipireg); // PKRU=0x300
write_pkru(reg); // PKRU=0x0

You *LOST* the update from the IPI.