Re: [RFC PATCH] arch: ARM64: add isb before enable pan

From: Zhaoyang Huang
Date: Fri Oct 08 2021 - 04:34:38 EST


On Fri, Oct 8, 2021 at 4:01 PM Will Deacon <will@xxxxxxxxxx> wrote:
>
> Hi,
>
> On Fri, Oct 08, 2021 at 02:07:49PM +0800, Huangzhaoyang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> >
> > set_pstate_pan failure is observed in an ARM64 system occasionaly on a reboot
> > test, which can be work around by a msleep on the sw context. We assume
> > suspicious on disorder of previous instr of disabling SW_PAN and add an isb here.
> >
> > PS:
> > The bootup test failed with a invalid TTBR1_EL1 that equals 0x34000000, which is
> > alike racing between on chip PAN and SW_PAN.
>
> Sorry, but I'm struggling to understand the problem here. Please could you
> explain it in more detail?
>
> - Why does a TTBR1_EL1 value of `0x34000000` indicate a race?
> - Can you explain the race that you think might be occurring?
> - Why does an ISB prevent the race?
Please find panic logs[1], related codes[2], sample of debug patch[3]
below. TTBR1_EL1 equals 0x34000000 when panic and can NOT be captured
by the debug patch during retest (all entrances that msr ttbr1_el1 are
under watch) which should work. Adding ISB here to prevent race on
TTBR1 from previous access of sysregs which can affect the msr
result(the test is still ongoing). Could the race be
ARM64_HAS_PAN(automated by core) and SW_PAN.

[1]
[ 0.348000] [0: migration/0: 11] Synchronous External Abort:
level 1 (translation table walk) (0x96000055) at 0xffffffc000e06004
[ 0.352000] [0: migration/0: 11] Internal error: : 96000055
[#1] PREEMPT SMP
[ 0.352000] [0: migration/0: 11] Modules linked in:
[ 0.352000] [0: migration/0: 11] Process migration/0 (pid:
11, stack limit = 0x (ptrval))
[ 0.352000] [0: migration/0: 11] CPU: 0 PID: 11 Comm:
migration/0 Tainted: G S
4.14.199-22631304-abA035FXXU0AUJ4_T4 #2
[ 0.352000] [0: migration/0: 11] Hardware name: Spreadtrum
UMS9230 1H10 SoC (DT)
[ 0.352000] [0: migration/0: 11] task: (ptrval)
task.stack: (ptrval)
[ 0.352000] [0: migration/0: 11] pc : patch_alternative+0x68/0x27c
[ 0.352000] [0: migration/0: 11] lr :
__apply_alternatives.llvm.7450387295891320208+0x60/0x160

[2]
__apply_alternatives
for()
patch_alternative <----panic here in the 2nd round of loop
after invoking flush_icache_range
flush_icache_range

[3]
sub \tmp1, \tmp1, #SWAPPER_DIR_SIZE
+ tst \tmp1, #0xffff80000000 // check ttbr1_el1 valid
+ b.le .
msr ttbr1_el1, \tmp1 // set reserved ASID

>
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> > ---
> > arch/arm64/kernel/cpufeature.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index efed283..3c0de0d 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -1663,6 +1663,7 @@ static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
> > WARN_ON_ONCE(in_interrupt());
> >
> > sysreg_clear_set(sctlr_el1, SCTLR_EL1_SPAN, 0);
> > + isb();
> > set_pstate_pan(1);
>
> SCTLR_EL1.SPAN only affects the PAN behaviour on taking an exception, which
> is itself a context-synchronizing event, so I can't see why the ISB makes
> any difference here (at least, for the purposes of PAN).
>
> Thanks,
>
> Will