Re: [PATCH] LoongArch: Report dying CPU to RCU in stop_this_cpu()
From: Guo Ren
Date: Mon Jun 22 2026 - 08:56:34 EST
On Mon, Jun 22, 2026 at 2:59 PM Huacai Chen <chenhuacai@xxxxxxxxxxx> wrote:
>
> This is a port of MIPS commit 9f3f3bdc6d9dac1 ("MIPS: smp: report dying
> CPU to RCU in stop_this_cpu()"). smp_send_stop() parks all secondary
> CPUs in stop_this_cpu(). And the function marks the CPU offline for the
> scheduler via set_cpu_online(false) but never informs RCU, so RCU keeps
> expecting a quiescent state from CPUs that are now spinning forever with
> interrupts disabled.
>
> As long as nothing waits for an RCU grace period after smp_send_stop()
> this is harmless, which is why it went unnoticed. However, since commit
> 91840be8f710370 ("irq_work: Fix use-after-free in irq_work_single() on
> PREEMPT_RT"), irq_work_sync() calls synchronize_rcu() on architectures
> without an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt()
> returns false. Any irq_work_sync() issued in the reboot/shutdown/halt
> path after smp_send_stop() then blocks on a grace period that can never
> complete, hanging the reboot:
>
> WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
> ...
> rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> rcu: Offline CPU 1 blocking current GP.
> rcu: Offline CPU 2 blocking current GP.
> rcu: Offline CPU 3 blocking current GP.
>
> This issue needs some hacks to reproduce, and it was not noticed on
> LoongArch because arch_irq_work_has_interrupt() usually returns true.
>
> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring
> the generic CPU-hotplug offline path, so RCU stops waiting on the parked
> CPUs and grace periods can still complete. LoongArch shuts down all CPUs
> here without going through the CPU-hotplug mechanism, so this report is
> not otherwise issued.
>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> Signed-off-by: Huacai Chen <chenhuacai@xxxxxxxxxxx>
> ---
> arch/loongarch/kernel/smp.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
> index c191c680de66..5d792256bbb9 100644
> --- a/arch/loongarch/kernel/smp.c
> +++ b/arch/loongarch/kernel/smp.c
> @@ -707,6 +707,7 @@ static void stop_this_cpu(void *dummy)
> set_cpu_online(smp_processor_id(), false);
> calculate_cpu_foreign_map();
> local_irq_disable();
> + rcutree_report_cpu_dead();
> while (true);
> }
>
> --
> 2.52.0
>
Thanks for the heads-up and the fix. The reasoning is clear — the
parked CPUs never report quiescent state to RCU, which can stall grace
periods and hang the reboot path. The change looks correct and
minimal.
Reviewed-by: Guo Ren <guoren@xxxxxxxxxx>
--
Best Regards
Guo Ren