Re: [PATCH V8 10/44] Documentation/pkeys: Add initial PKS documentation

From: Dave Hansen
Date: Fri Jan 28 2022 - 18:57:58 EST


On 1/27/22 09:54, ira.weiny@xxxxxxxxx wrote:
> From: Ira Weiny <ira.weiny@xxxxxxxxx>
>
> Add initial overview and configuration information about PKS.
>
> Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> ---
> Documentation/core-api/protection-keys.rst | 57 ++++++++++++++++++++--
> 1 file changed, 53 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
> index 12331db474aa..58670e3ee39e 100644
> --- a/Documentation/core-api/protection-keys.rst
> +++ b/Documentation/core-api/protection-keys.rst
> @@ -12,6 +12,9 @@ PKeys Userspace (PKU) is a feature which is found on Intel's Skylake "Scalable
> Processor" Server CPUs and later. And it will be available in future
> non-server Intel parts and future AMD processors.
>
> +Protection Keys for Supervisor pages (PKS) is available in the SDM since May
> +2020.

I'd just remove this. Folks don't need to know the SDM history. I'd
only talk about it here if they would have a hard time finding it
somehow. Seeing as its in the main SDM, I can't see how that's a problem.

> pkeys work by dedicating 4 previously Reserved bits in each page table entry to
> a "protection key", giving 16 possible keys.
>
> @@ -22,13 +25,20 @@ and Write Disable) for each of 16 keys.
> Being a CPU register, PKRU is inherently thread-local, potentially giving each
> thread a different set of protections from every other thread.
>
> -There are two instructions (RDPKRU/WRPKRU) for reading and writing to the
> -register. The feature is only available in 64-bit mode, even though there is
> +For Userspace (PKU), there are two instructions (RDPKRU/WRPKRU) for reading and
> +writing to the register.
> +
> +For Supervisor (PKS), the register (MSR_IA32_PKRS) is accessible only to the
> +kernel through rdmsr and wrmsr.
> +
> +The feature is only available in 64-bit mode, even though there is
> theoretically space in the PAE PTEs. These permissions are enforced on data
> access only and have no effect on instruction fetches.
>
> -Syscalls
> -========
> +
> +
> +Syscalls for user space keys
> +============================
>
> There are 3 system calls which directly interact with pkeys::
>
> @@ -95,3 +105,42 @@ with a read()::
> The kernel will send a SIGSEGV in both cases, but si_code will be set
> to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
> the plain mprotect() permissions are violated.
> +
> +
> +Kernel API for PKS support
> +==========================
> +
> +Overview
> +--------
> +
> +Similar to user space pkeys, supervisor pkeys allow additional protections to
> +be defined for a supervisor mappings. Unlike user space pkeys, violations of
> +these protections result in a kernel oops.
> +
> +Supervisor Memory Protection Keys (PKS) is a feature which is found on Intel's
> +Sapphire Rapids (and later) "Scalable Processor" Server CPUs. It will also be
> +available in future non-server Intel parts.

This is a little weird. You've already talked about PKRS and then later
introduce the feature?

Also, perhaps this CPU model bit should just be next to the CPU model
bit about PKU.

> +Also qemu has support as well: https://www.qemu.org/2021/04/30/qemu-6-0-0/
> +
> +Kconfig
> +-------
> +Kernel users intending to use PKS support should depend on
> +ARCH_HAS_SUPERVISOR_PKEYS, and select ARCH_ENABLE_SUPERVISOR_PKEYS to turn on
> +this support within the core.

Maybe this should talk about the Kconfig options a bit more. Maybe even
an example:

config MY_NEW_FEATURE
depends on ARCH_HAS_SUPERVISOR_PKEYS
select ARCH_ENABLE_SUPERVISOR_PKEYS

This will make "MY_NEW_FEATURE" unavailable unless the architecture sets
ARCH_HAS_SUPERVISOR_PKEYS. It also makes it possible for multiple
independent features to "select ARCH_ENABLE_SUPERVISOR_PKEYS". PKS
support will not be compiled into the kernel unless one or more features
selects ARCH_ENABLE_SUPERVISOR_PKEYS.

> +MSR details
> +-----------
> +
> +It should be noted that the underlying WRMSR(MSR_IA32_PKRS) is not serializing
> +but still maintains ordering properties similar to WRPKRU.

s/It should be noted that the underlying //

I'd probably say:

WRMSR is typically an architecturally serializing instruction.
However, WRMSR(MSR_IA32_PKRS) is an exceptions. It is not a
serializing instruction and instead maintains ordering
properties similar to WRPKRU.

and maybe:

Check the WRPKRU documentation in the latest version of the SDM
for details.

> +Older versions of the SDM on PKRS may be wrong with regard to this
> +serialization. The text should be the same as that of WRPKRU. From the WRPKRU
> +text:
> +
> + WRPKRU will never execute transiently. Memory accesses
> + affected by PKRU register will not execute (even transiently)
> + until all prior executions of WRPKRU have completed execution
> + and updated the PKRU register.

I wouldn't go over this. Software has bugs. Documentation has bugs. I
expect folks to use the most recent version.

BTW, there are still a few places in SDM 076 that miss mentioning the
non-serializing properties of PKRS. I also don't see anything
specifically about the speculative behavior. There might be fixes on
the way, but can you double-check?