Re: [PATCH] powerpc/32s: Fix random crashes by adding isync() after locking/unlocking KUEP

From: Segher Boessenkool
Date: Tue Aug 17 2021 - 14:09:30 EST


Hi!

On Tue, Aug 17, 2021 at 07:13:44PM +0200, Christophe Leroy wrote:
> Le 17/08/2021 à 18:22, Segher Boessenkool a écrit :
> >On Tue, Aug 17, 2021 at 02:43:15PM +0000, Christophe Leroy wrote:
> >>Commit b5efec00b671 ("powerpc/32s: Move KUEP locking/unlocking in C")
> >>removed the 'isync' instruction after adding/removing NX bit in user
> >>segments. The reasoning behind this change was that when setting the
> >>NX bit we don't mind it taking effect with delay as the kernel never
> >>executes text from userspace, and when clearing the NX bit this is
> >>to return to userspace and then the 'rfi' should synchronise the
> >>context.
> >>
> >>However, it looks like on book3s/32 having a hash page table, at least
> >>on the G3 processor, we get an unexpected fault from userspace, then
> >>this is followed by something wrong in the verification of MSR_PR
> >>at end of another interrupt.
> >>
> >>This is fixed by adding back the removed isync() following update
> >>of NX bit in user segment registers. Only do it for cores with an
> >>hash table, as 603 cores don't exhibit that problem and the two isync
> >>increase ./null_syscall selftest by 6 cycles on an MPC 832x.
> >>
> >>First problem: unexpected PROTFAULT
> >>
> >> [ 62.896426] WARNING: CPU: 0 PID: 1660 at
> >> arch/powerpc/mm/fault.c:354 do_page_fault+0x6c/0x5b0
> >> [ 62.918111] Modules linked in:
> >> [ 62.923350] CPU: 0 PID: 1660 Comm: Xorg Not tainted
> >> 5.13.0-pmac-00028-gb3c15b60339a #40
> >> [ 62.943476] NIP: c001b5c8 LR: c001b6f8 CTR: 00000000
> >> [ 62.954714] REGS: e2d09e40 TRAP: 0700 Not tainted
> >> (5.13.0-pmac-00028-gb3c15b60339a)
> >
> >That is not a protection fault. What causes this?
>
> That's the WARN_ON(error_code & DSISR_PROTFAULT) at
>
> https://elixir.bootlin.com/linux/v5.13/source/arch/powerpc/mm/fault.c#L354

Ah okay. How confusing :-/

> >A CSI (like isync) is required both before and after mtsr. It may work
> >on some cores without -- what part of that is luck, if there is anything
> >that guarantees it, is anyone's guess :-/
>
> kuep_lock() is called when entering interrupts, it means we recently got an
> 'rfi' to re-enable MMU.
> kuep_unlock() is called when exit interrupts, it means we are soon going to
> call 'rfi' to go back to user.
>
> In between, nobody is going to exec any userspace code, so who minds that
> the 'mtsr' changing user segments is not completely finished ?

Hey, that is my question! :-)

So why does this not work on 750 then?

> >>@@ -28,6 +30,8 @@ static inline void kuep_lock(void)
> >> return;
> >>
> >> update_user_segments(mfsr(0) | SR_NX);
> >>+ if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
> >>+ isync(); /* Context sync required after mtsr() */
> >> }
> >
> >This needs a comment why you are not doing this for systems without
> >hardware page table walk, at the least?
>
> Ok, will add a comment tomorrow.

Thanks!


Segher