Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on

From: Bhupesh Sharma
Date: Fri Dec 14 2018 - 00:02:10 EST


On Fri, Dec 14, 2018 at 9:39 AM Qian Cai <cai@xxxxxx> wrote:
>
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
>
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) before turning the
> MMU on in the second kernel which caused this instruction hung,
>
> msr sctlr_el1, x0
>
> Although there is a local TLB flush in the second kernel in
> __cpu_setup(), it is called too early. When the time to turn the MMU on
> later, the TLB is dirty again from some reasons.
>
> Also tried to move the local TLB flush part around a bit inside
> __cpu_setup(), although it did complete kdump some times, it did trigger
> "Synchronous Exception" in EFI after a cold-reboot fairly often that
> seems no way to recover remotely without reinstalling the OS. For
> example, in those places,
>
> ENTRY(__cpu_setup)
> + isb
> tlbi vmalle1
> dsb nsh
>
> or
>
> mov x0, #3 << 20
> msr cpacr_el1, x0
> + tlbi vmalle1
> + dsb nsh
>
> Since it is only necessary to flush local TLB right before turning the
> MMU on, just re-arrage the part a bit like the one in __primary_switch()
> within CONFIG_RANDOMIZE_BASE path, so it does not depends on other
> instructions in between that could pollute the TLB, and it no longer
> trigger "Synchronous Exception" as well.
>
> Signed-off-by: Qian Cai <cai@xxxxxx>
> ---
>
> v2: merge the similar part from __cpu_setup() pointed out by James.
>
> arch/arm64/kernel/head.S | 4 ++++
> arch/arm64/mm/proc.S | 3 ---
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..7f555dd4577e 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> msr ttbr0_el1, x2 // load TTBR0
> msr ttbr1_el1, x1 // load TTBR1
> isb
> +
> + tlbi vmalle1 // invalidate TLB
> + dsb nsh
> +
> msr sctlr_el1, x0
> isb
> /*
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 2c75b0b903ae..14f68afdd57f 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
> */
> .pushsection ".idmap.text", "awx"
> ENTRY(__cpu_setup)
> - tlbi vmalle1 // Invalidate local TLB
> - dsb nsh
> -
> mov x0, #3 << 20
> msr cpacr_el1, x0 // Enable FP/ASIMD
> mov x0, #1 << 12 // Reset mdscr_el1 and disable
> --
> 2.17.2 (Apple Git-113)
>

Not sure why I can't reproduce on my HPE Apollo machine, so a couple
of questions:
1. How many CPUs do you enable in the kdump kernel - do you pass
'nr_cpus=1' to the kdump kernel to limit the maximum number of cores
to 1 in the kdump kernel?
2. Which firmware version do you use on your board?

Thanks,
Bhupesh