Re: [RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching

From: Christophe Leroy
Date: Wed Apr 15 2020 - 05:12:59 EST




Le 15/04/2020 Ã 07:16, Christopher M Riedl a ÃcritÂ:
On March 26, 2020 9:42 AM Christophe Leroy <christophe.leroy@xxxxxx> wrote:

This patch fixes the RFC series identified below.
It fixes three points:
- Failure with CONFIG_PPC_KUAP
- Failure to write do to lack of DIRTY bit set on the 8xx
- Inadequaly complex WARN post verification

However, it has an impact on the CPU load. Here is the time
needed on an 8xx to run the ftrace selftests without and
with this series:
- Without CONFIG_STRICT_KERNEL_RWX ==> 38 seconds
- With CONFIG_STRICT_KERNEL_RWX ==> 40 seconds
- With CONFIG_STRICT_KERNEL_RWX + this series ==> 43 seconds

Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxx>
---
arch/powerpc/lib/code-patching.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index f156132e8975..4ccff427592e 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
}
pte = mk_pte(page, pgprot);
+ pte = pte_mkdirty(pte);
set_pte_at(patching_mm, patching_addr, ptep, pte);
init_temp_mm(&patch_mapping->temp_mm, patching_mm);
@@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
(offset_in_page((unsigned long)addr) /
sizeof(unsigned int));
+ allow_write_to_user(patch_addr, sizeof(instr));
__patch_instruction(addr, instr, patch_addr);
+ prevent_write_to_user(patch_addr, sizeof(instr));


On radix we can map the page with PAGE_KERNEL protection which ends up
setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.

Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
because this is a temporary kernel mapping which really isn't userspace in
the usual sense.

On the 8xx, that's pretty different.

The PTE doesn't control whether a page is user page or a kernel page. The only thing that is set in the PTE is whether a page is linked to a given PID or not.
PAGE_KERNEL tells that the page can be addressed with any PID.

The user access right is given by a kind of zone, which is in the PGD entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0. Every pages below PAGE_OFFSET are defined as belonging to zone 1.

By default, zone 0 can only be accessed by kernel, and zone 1 can only be accessed by user. When kernel wants to access zone 1, it temporarily changes properties of zone 1 to allow both kernel and user accesses.

So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel must unlock it to access it.


And this is more or less the same on hash/32. This is managed by segment registers. One segment register corresponds to a 256Mbytes area. Every pages below PAGE_OFFSET can only be read by default by kernel. Only user can write if the PTE allows it. When the kernel needs to write at an address below PAGE_OFFSET, it must change the segment properties in the corresponding segment register.

So, for both cases, if we want to have it local to a task while still allowing kernel access, it means we have to define a new special area between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.

That looks complex to me for a small benefit, especially as 8xx is not SMP and neither are most of the hash/32 targets.

Christophe