Re: [PATCH v4 03/12] smp: Remove get_cpu from smp_call_function_any
From: Chuyi Zhou
Date: Wed Apr 08 2026 - 23:50:13 EST
On 2026-03-31 7:30 p.m., Chuyi Zhou wrote:
> Now smp_call_function_single() would enable preemption before
> csd_lock_wait() to reduce the critical section. To allow callers of
> smp_call_function_any() to also benefit from this optimization, remove
> get_cpu()/put_cpu() from smp_call_function_any().
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx>
> Reviewed-by: Muchun Song <muchun.song@xxxxxxxxx>
> ---
> kernel/smp.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/smp.c b/kernel/smp.c
> index b603d4229f95..80daf9dd4a25 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -761,16 +761,26 @@ EXPORT_SYMBOL_GPL(smp_call_function_single_async);
> int smp_call_function_any(const struct cpumask *mask,
> smp_call_func_t func, void *info, int wait)
> {
> + bool local = true;
> unsigned int cpu;
> int ret;
>
> - /* Try for same CPU (cheapest) */
> + /*
> + * Prevent migration to another CPU after selecting the current CPU
> + * as the target.
> + */
> cpu = get_cpu();
> - if (!cpumask_test_cpu(cpu, mask))
> +
> + /* Try for same CPU (cheapest) */
> + if (!cpumask_test_cpu(cpu, mask)) {
> cpu = sched_numa_find_nth_cpu(mask, 0, cpu_to_node(cpu));
> + local = false;
> + put_cpu();
> + }
>
From sashiko[1]:
' By calling put_cpu() before smp_call_function_single() for remote
CPUs, does this open a preemption window where the selected remote CPU
could be fully offlined before the IPI is dispatched?
If the selected CPU goes offline in this window,
smp_call_function_single() will fail the cpu_online() check and return
-ENXIO. Because this function does not retry with another CPU from the
mask, it propagates -ENXIO directly to the caller even if other valid
online CPUs remain in the mask.
Could this cause spurious failures for callers in preemptible contexts
that do not hold CPU hotplug locks, violating the guarantee to run on
any online CPU in the mask? '
This causes smp_call_function_any() to return -ENXIO immediately, rather
than returning -ENXIO only when all CPUs are offline.
One way to fix this is to refactor the logic of smp_call_function_any()
and smp_call_function_single(), moving the target CPU selection logic
into smp_call_function_single() and keeping preemption disabled until
the IPI is sent out.
Another approach is to retry sched_numa_find_nth_cpu() once
smp_call_function_single() returns -ENXIO, until
sched_numa_find_nth_cpu() returns nr_cpu_ids.
Do you have any better suggestions?
Thanks.
[1]
https://sashiko.dev/#/patchset/20260331113103.2197007-1-zhouchuyi%40bytedance.com
> ret = smp_call_function_single(cpu, func, info, wait);
> - put_cpu();
> + if (local)
> + put_cpu();
> return ret;
> }
> EXPORT_SYMBOL_GPL(smp_call_function_any);