Re: [PATCH V3 01/11] ARM: Disable FIQs (but not IRQs) on CPUs shutdown paths

From: Guilherme G. Piccoli
Date: Sun Sep 18 2022 - 10:05:45 EST


On 19/08/2022 19:17, Guilherme G. Piccoli wrote:
> Currently the regular CPU shutdown path for ARM disables IRQs/FIQs
> in the secondary CPUs - smp_send_stop() calls ipi_cpu_stop(), which
> is responsible for that. IRQs are architecturally masked when we
> take an interrupt, but FIQs are high priority than IRQs, hence they
> aren't masked. With that said, it makes sense to disable FIQs here,
> but there's no need for (re-)disabling IRQs.
>
> More than that: there is an alternative path for disabling CPUs,
> in the form of function crash_smp_send_stop(), which is used for
> kexec/panic path. This function relies on a SMP call that also
> triggers a busy-wait loop [at machine_crash_nonpanic_core()], but
> without disabling FIQs. This might lead to odd scenarios, like
> early interrupts in the boot of kexec'd kernel or even interrupts
> in secondary "disabled" CPUs while the main one still works in the
> panic path and assumes all secondary CPUs are (really!) off.
>
> So, let's disable FIQs in both paths and *not* disable IRQs a second
> time, since they are already masked in both paths by the architecture.
> This way, we keep both CPU quiesce paths consistent and safe.
>
> Cc: Marc Zyngier <maz@xxxxxxxxxx>
> Cc: Michael Kelley <mikelley@xxxxxxxxxxxxx>
> Cc: Russell King <linux@xxxxxxxxxxxxxxx>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@xxxxxxxxxx>
>
> ---
>
> V3:
> - No changes.
>
> V2:
> - Small wording improvement (thanks Michael Kelley);
> - Only disable FIQs, since IRQs are masked by architecture
> definition when we take an interrupt. Thanks a lot Russell
> and Marc for the discussion [0].
>
> Should we add a Fixes tag here? If so, maybe the proper target is:
> b23065313297 ("ARM: 6522/1: kexec: Add call to non-crashing cores through IPI")
>
> [0] https://lore.kernel.org/lkml/Ymxcaqy6DwhoQrZT@xxxxxxxxxxxxxxxxxxxxx/
>
>
> arch/arm/kernel/machine_kexec.c | 2 ++
> arch/arm/kernel/smp.c | 5 ++---
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index f567032a09c0..0b482bcb97f7 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -77,6 +77,8 @@ void machine_crash_nonpanic_core(void *unused)
> {
> struct pt_regs regs;
>
> + local_fiq_disable();
> +
> crash_setup_regs(&regs, get_irq_regs());
> printk(KERN_DEBUG "CPU %u will stop doing anything useful since another CPU has crashed\n",
> smp_processor_id());
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 978db2d96b44..36e6efad89f3 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -600,6 +600,8 @@ static DEFINE_RAW_SPINLOCK(stop_lock);
> */
> static void ipi_cpu_stop(unsigned int cpu)
> {
> + local_fiq_disable();
> +
> if (system_state <= SYSTEM_RUNNING) {
> raw_spin_lock(&stop_lock);
> pr_crit("CPU%u: stopping\n", cpu);
> @@ -609,9 +611,6 @@ static void ipi_cpu_stop(unsigned int cpu)
>
> set_cpu_online(cpu, false);
>
> - local_fiq_disable();
> - local_irq_disable();
> -
> while (1) {
> cpu_relax();
> wfe();

[+CC all ARM folks I could find]

Hi folks, sorry for the ping. Any reviews are greatly appreciated in
this one - this is the V3 of the series but I never got any specific
review for this patch.

Based on a previous discussion with Russell and Marc [0], seems this one
makes sense and fixes an inconsistency related to kdump/panic.

Really appreciate reviews or at least some indication on which path I
should take to get this potentially merged.

Cheers,


Guilherme


[0] https://lore.kernel.org/lkml/Ymxcaqy6DwhoQrZT@xxxxxxxxxxxxxxxxxxxxx/