RE: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

From: PLATTNER Christoph
Date: Mon Feb 01 2021 - 05:31:04 EST


Hello to all, and thank you very much for first and second fast response.

I do not have a long history on PowerPC MMU environment, I hacked into this topic
for about 3 months for analyzing that problem- so, sorry, if I am wrong in some points ...

What I learn so far from this MPC5121e (variant of e300c4 core):
- It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so always the
branch "if (! Hash) ...." is taken, so, I assume that "key 0" and "key 1" setups are not
used on this CPU (not supporting MMU_FTR_HPTE_TABLE)
- The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does not react.
As far I have understood, the TLB miss routines are responsible for checking permissions.
The TLB miss routines check the Linux PTE styled entries and generates the PP bits
for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU models ...
- The PTE entries in Linux are fully "void" in sense of this CPU type, as this CPU does not
read any PTEs from RAM (no HW support in contrast to x86 or ARM or later ppc...).

In summary - as far as I understand it now - we have to handle the PTE bits differently
(Linux style) for PROT_NONE permissions - OR - we have to expand the permission
checking like my proposed experimental patch. (PROT_NONE is not NUMA related only,
but may not used very often ...).

Another related point:
According e300 RM (manual) the ACCESSED bit in the PTE shall be set on TLB miss, as
it is an indication, that page is used. In 4.4 kernel this write back of the _PAGE_ACCESSED
bit was performed after successful permission check:

bne- DataAddressInvalid /* return if access not permitted */
ori r0,r0,_PAGE_ACCESSED /* set _PAGE_ACCESSED in pte */
/*
* NOTE! We are assuming this is not an SMP system, otherwise
* we would need to update the pte atomically with lwarx/stwcx.
*/
stw r0,0(r2) /* update PTE (accessed bit) */
/* Convert linux-style PTE to low word of PPC-style PTE */

Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, this is not needed, as the
PTE is never seen by the PPC chip. But I do not understand, WHY the PAGE_ACCCESSED
is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):

cmplw 0,r1,r3
mfspr r2, SPRN_SDR1
li r1, _PAGE_PRESENT | _PAGE_ACCESSED
rlwinm r2, r2, 28, 0xfffff000
bgt- 112f

What is the reason or relevance for checking this here ?
Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
Do you know the reason of change on this point ?

Another remark to Core manual relevant for this:
There is the reference manual for e300 core available (e300 RM). It includes
many remarks in range of Memory Management section, that many features
are optional or variable for dedicated implementations. On the other hand,
the MPC5121e reference manual refers to the e300 core RM, but DOES NOT
information, which of the optional points are there or nor. According my
analysis, MPC5121e does not include any of the optional features.


Thanks a lot for first reactions
Christoph


-----Original Message-----
From: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
Sent: Montag, 1. Februar 2021 07:30
To: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>; Paul Mackerras <paulus@xxxxxxxxx>; Michael Ellerman <mpe@xxxxxxxxxxxxxx>; PLATTNER Christoph <christoph.plattner@xxxxxxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx; linuxppc-dev@xxxxxxxxxxxxxxxx
Subject: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

On book3s/32, page protection is defined by the PP bits in the PTE which provide the following protection depending on the access keys defined in the matching segment register:
- PP 00 means RW with key 0 and N/A with key 1.
- PP 01 means RW with key 0 and RO with key 1.
- PP 10 means RW with both key 0 and key 1.
- PP 11 means RO with both key 0 and key 1.

Since the implementation of kernel userspace access protection, PP bits have been set as follows:
- PP00 for pages without _PAGE_USER
- PP01 for pages with _PAGE_USER and _PAGE_RW
- PP11 for pages with _PAGE_USER and without _PAGE_RW

For kernelspace segments, kernel accesses are performed with key 0 and user accesses are performed with key 1. As PP00 is used for non _PAGE_USER pages, user can't access kernel pages not flagged _PAGE_USER while kernel can.

For userspace segments, both kernel and user accesses are performed with key 0, therefore pages not flagged _PAGE_USER are still accessible to the user.

This shouldn't be an issue, because userspace is expected to be accessible to the user. But unlike most other architectures, powerpc implements PROT_NONE protection by removing _PAGE_USER flag instead of flagging the page as not valid. This means that pages in userspace that are not flagged _PAGE_USER shall remain inaccessible.

To get the expected behaviour, just mimic other architectures in the TLB miss handler by checking _PAGE_USER permission on userspace accesses as if it was the _PAGE_PRESENT bit.

Note that this problem only is only for 603 cores. The 604+ have an hash table, and hash_page() function already implement the verification of _PAGE_USER permission on userspace pages.

Reported-by: Christoph Plattner <christoph.plattner@xxxxxxxxxxxxxxx>
Fixes: f342adca3afc ("powerpc/32s: Prepare Kernel Userspace Access Protection")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
---
arch/powerpc/kernel/head_book3s_32.S | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 858fbc8b19f3..0004e8a6a58e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -453,11 +453,12 @@ InstructionTLBMiss:
cmplw 0,r1,r3
#endif
mfspr r2, SPRN_SDR1
- li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
+ li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER
rlwinm r2, r2, 28, 0xfffff000
#ifdef CONFIG_MODULES
bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
+ li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
#endif
112: rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */
@@ -516,10 +517,11 @@ DataLoadTLBMiss:
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
mfspr r2, SPRN_SDR1
- li r1, _PAGE_PRESENT | _PAGE_ACCESSED
+ li r1, _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
rlwinm r2, r2, 28, 0xfffff000
bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
+ li r1, _PAGE_PRESENT | _PAGE_ACCESSED
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
112: rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */
lwz r2,0(r2) /* get pmd entry */
@@ -593,10 +595,11 @@ DataStoreTLBMiss:
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
mfspr r2, SPRN_SDR1
- li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
+ li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
rlwinm r2, r2, 28, 0xfffff000
bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
+ li r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
addi r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l /* kernel page table */
112: rlwimi r2,r3,12,20,29 /* insert top 10 bits of address */
lwz r2,0(r2) /* get pmd entry */
--
2.25.0