On April 15, 2020 4:12 AM Christophe Leroy <christophe.leroy@xxxxxx> wrote:
Le 15/04/2020 Ã 07:16, Christopher M Riedl a ÃcritÂ:On March 26, 2020 9:42 AM Christophe Leroy <christophe.leroy@xxxxxx> wrote:
This patch fixes the RFC series identified below.
It fixes three points:
- Failure with CONFIG_PPC_KUAP
- Failure to write do to lack of DIRTY bit set on the 8xx
- Inadequaly complex WARN post verification
However, it has an impact on the CPU load. Here is the time
needed on an 8xx to run the ftrace selftests without and
with this series:
- Without CONFIG_STRICT_KERNEL_RWX ==> 38 seconds
- With CONFIG_STRICT_KERNEL_RWX ==> 40 seconds
- With CONFIG_STRICT_KERNEL_RWX + this series ==> 43 seconds
Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxx>
---
arch/powerpc/lib/code-patching.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index f156132e8975..4ccff427592e 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
}
pte = mk_pte(page, pgprot);
+ pte = pte_mkdirty(pte);
set_pte_at(patching_mm, patching_addr, ptep, pte);
init_temp_mm(&patch_mapping->temp_mm, patching_mm);
@@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
(offset_in_page((unsigned long)addr) /
sizeof(unsigned int));
+ allow_write_to_user(patch_addr, sizeof(instr));
__patch_instruction(addr, instr, patch_addr);
+ prevent_write_to_user(patch_addr, sizeof(instr));
On radix we can map the page with PAGE_KERNEL protection which ends up
setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.
Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
because this is a temporary kernel mapping which really isn't userspace in
the usual sense.
On the 8xx, that's pretty different.
The PTE doesn't control whether a page is user page or a kernel page.
The only thing that is set in the PTE is whether a page is linked to a
given PID or not.
PAGE_KERNEL tells that the page can be addressed with any PID.
The user access right is given by a kind of zone, which is in the PGD
entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0.
Every pages below PAGE_OFFSET are defined as belonging to zone 1.
By default, zone 0 can only be accessed by kernel, and zone 1 can only
be accessed by user. When kernel wants to access zone 1, it temporarily
changes properties of zone 1 to allow both kernel and user accesses.
So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel
must unlock it to access it.
And this is more or less the same on hash/32. This is managed by segment
registers. One segment register corresponds to a 256Mbytes area. Every
pages below PAGE_OFFSET can only be read by default by kernel. Only user
can write if the PTE allows it. When the kernel needs to write at an
address below PAGE_OFFSET, it must change the segment properties in the
corresponding segment register.
So, for both cases, if we want to have it local to a task while still
allowing kernel access, it means we have to define a new special area
between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.
That looks complex to me for a small benefit, especially as 8xx is not
SMP and neither are most of the hash/32 targets.
Agreed. So I guess the solution is to differentiate between radix/non-radix
and use PAGE_SHARED for non-radix along with the KUAP functions when KUAP
is enabled. Hmm, I need to think about this some more, especially if it's
acceptable to temporarily map kernel text as PAGE_SHARED for patching. Do
you see any obvious problems on 8xx and hash/32 w/ using PAGE_SHARED?
I don't necessarily want to drop the local mm patching idea for non-radix
platforms since that means we would have to maintain two implementations.