Re: [PATCH] arm64: fix potential deadlock in arm64-provide-pseudo-NMI-with-GICv3

From: Julien Thierry
Date: Tue Jan 29 2019 - 08:42:24 EST


Hi Wei,

Thanks testing the series.

On 29/01/2019 13:12, Wei Li wrote:
> In some exception handlers, the interrupt is not reenabled by daifclr at first.
> The later process may call local_irq_enable() to enable the interrupt, like
> gic_handle_irq(). As we known, function local_irq_enable() just change the pmr now.

This is not yet in, so it might be useful to point to the series that
adds this:
https://lkml.org/lkml/2019/1/21/1060

> The following codes what i found may cause a deadlock or some issues else:
>
> do_sp_pc_abort <- el0_sp_pc
> do_el0_ia_bp_hardening <- el0_ia
> kgdb_roundup_cpus <- el1_dbg
>
> Signed-off-by: Wei Li <liwei391@xxxxxxxxxx>
> ---
> arch/arm64/kernel/kgdb.c | 4 ++++
> arch/arm64/mm/fault.c | 6 ++++++
> 2 files changed, 10 insertions(+)
>
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index a20de58061a8..119fbf2c0788 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -25,6 +25,7 @@
> #include <linux/kgdb.h>
> #include <linux/kprobes.h>
> #include <linux/sched/task_stack.h>
> +#include <linux/irqchip/arm-gic-v3.h>
>
> #include <asm/debug-monitors.h>
> #include <asm/insn.h>
> @@ -291,6 +292,9 @@ static void kgdb_call_nmi_hook(void *ignored)
>
> void kgdb_roundup_cpus(unsigned long flags)

Hmm, I don't see this function defined in arch/arm64/kernel/kgdb.c in
v5.0-rc*. Was it removed? or is that something in your local tree.

> {
> + if (gic_prio_masking_enabled())
> + gic_arch_enable_irqs();
> +
> local_irq_enable();

Seeing we introduce the daifflags functions, with the relation described
at the top of arch/arm64/include/asm/daifflags.h. I think just calling
local_irq_enable() might not comply with this, as PSR.I would be clear
while PSR.D is set.

Maybe it should be using:

local_daif_restore(DAIF_PROCCTX);

> smp_call_function(kgdb_call_nmi_hook, NULL, 0);
> local_irq_disable();

and here
local_daif_mask();

Although I'd like to understand what you are applying the pseudo-NMI
series on first.

> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 97ba2ba78aee..f7c39a0b28bc 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -32,6 +32,7 @@
> #include <linux/perf_event.h>
> #include <linux/preempt.h>
> #include <linux/hugetlb.h>
> +#include <linux/irqchip/arm-gic-v3.h>
>
> #include <asm/bug.h>
> #include <asm/cmpxchg.h>
> @@ -780,6 +781,8 @@ asmlinkage void __exception do_el0_ia_bp_hardening(unsigned long addr,
> if (addr > TASK_SIZE)
> arm64_apply_bp_hardening();
>
> + if (gic_prio_masking_enabled())
> + gic_arch_enable_irqs();
> local_irq_enable();

This is not in mainline, in v5.0-rc1 there is:

local_daif_restore(DAIF_PROCCTX);

Which my series updates to modify both DAIF and PMR if needed.

So you wouldn't need to have the gic_arch_enable_irqs().

> do_mem_abort(addr, esr, regs);
> }
> @@ -794,6 +797,9 @@ asmlinkage void __exception do_sp_pc_abort(unsigned long addr,
> if (user_mode(regs)) {
> if (instruction_pointer(regs) > TASK_SIZE)
> arm64_apply_bp_hardening();
> +
> + if (gic_prio_masking_enabled())
> + gic_arch_enable_irqs();
> local_irq_enable();

Same here.

Thanks,

--
Julien Thierry