Re: [PATCH] alpha: Fix SMP shutdown hang due to missing memory barriers

From: Magnus Lindholm

Date: Fri May 29 2026 - 09:55:12 EST


On Fri, May 29, 2026 at 1:10 AM Matt Turner <mattst88@xxxxxxxxx> wrote:
>
> Alpha has a very weak memory model. halt() makes no guarantee that
> pending stores have drained from the store buffer. If set_cpu_present()
> stores are still buffered when a secondary CPU halts, they are lost,
> and the boot CPU spins forever in the cpu_present_mask wait loop.
>
> Add mb() before halt() on secondary CPUs to flush the store buffer,
> and use smp_mb() in the boot CPU's poll loop instead of the
> compiler-only barrier() to ensure it observes secondary CPUs' stores.
>
> This avoids a deadlock on shutdown on EV7/Marvel platforms.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Assisted-by: Claude:claude-sonnet-4-6
> Signed-off-by: Matt Turner <mattst88@xxxxxxxxx>
> ---
> arch/alpha/kernel/process.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git ./arch/alpha/kernel/process.c ./arch/alpha/kernel/process.c
> index 06522451f018..d50f9cfd8333 100644
> --- ./arch/alpha/kernel/process.c
> +++ ./arch/alpha/kernel/process.c
> @@ -99,6 +99,7 @@ common_shutdown_1(void *generic_ptr)
> *pflags = flags;
> set_cpu_present(cpuid, false);
> set_cpu_possible(cpuid, false);
> + mb();
> halt();
> }
> #endif
> @@ -127,7 +128,7 @@ common_shutdown_1(void *generic_ptr)
> set_cpu_present(boot_cpuid, false);
> set_cpu_possible(boot_cpuid, false);
> while (!cpumask_empty(cpu_present_mask))
> - barrier();
> + smp_mb();
> #endif
>
> /* If booted from SRM, reset some of the original environment. */
> --
> 2.53.0
>

This looks correct to me. halt() is not a memory-ordering primitive, so on
Alpha the secondary CPU needs a real mb() before stopping. Replacing the
boot CPU's compiler-only barrier() with smp_mb() also looks appropriate for
the polling loop. Looks like you have nailed down a long-standing memory
ordering bug, nice work!

I've applied this patch and, for what it's worth, tested it on my
AlphaServer ES40 to make sure there are no obvious regressions on a
non-EV7 platform. The system shuts down/reboots as expected with this
change applied.

Tested-by: Magnus Lindholm <linmag7@xxxxxxxxx>
Reviewed-by: Magnus Lindholm <linmag7@xxxxxxxxx>