[PATCH v9 06/14] smp: Enable preemption early in smp_call_function_many_cond()

From: Chuyi Zhou

Date: Tue Jun 30 2026 - 07:24:32 EST


smp_call_function_many_cond() still has to keep the caller pinned to the
current CPU while the remote IPI request is built and dispatched. This
protects the queueing state and CPU-hotplug boundary that are required
before the synchronous wait starts:

- It protects the current CPU's per-CPU scratch cpumask,
cfd->cpumask_ipi. Another task running on the same CPU could otherwise
enter smp_call_function_many_cond() and reuse that scratch cpumask
before the current caller has finished building and sending the IPI
request.

- It provides the CPU-hotplug exclusion required by the CSD queueing
side. New CSDs must not be queued after smpcfd_dying_cpu() has flushed
the outgoing CPU's callback queue. Keeping preemption disabled until
all required CSDs have been queued and the corresponding IPIs have
been sent prevents CPU offline from crossing that boundary in the
middle of the queueing operation.

The CSD acquisition side also relies on that caller-side CPU pinning.
csd_lock() waits for CSD_FLAG_LOCK to clear and then marks the CSD busy
with a regular store, so another task on the same CPU must not be
allowed to acquire and reinitialize the same per-CPU CSD concurrently.

After the callbacks have been queued and the IPIs have been sent, the
caller only performs the final csd_lock_wait() completion wait. If it is
preempted there, another task running on the original CPU may enter
smp_call_function_many_cond(), but any attempt to reuse the same per-CPU
CSD will block in csd_lock() until the previous callback clears
CSD_FLAG_LOCK. The final csd_lock_wait() does not acquire or reinitialize
the CSD, so it does not need the same caller-side preemption-disabled
protection.

The wait mask is task-local, so it cannot be overwritten by another task
on the original CPU. The per-CPU CSD storage also remains allocated
across CPU offline, so csd_lock_wait() can safely dereference it even if
the target CPU is offlined after the caller is unpinned.

With those requirements satisfied, enable preemption before the
synchronous csd_lock_wait() loop. This makes the potentially long wait
preemptible and migratable while keeping the CPU-pinned section around
the remote CPU selection and IPI dispatch.

Signed-off-by: Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx>
Tested-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
---
kernel/smp.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index e76de3010b30..92f984754139 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -858,15 +858,14 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
unsigned int scf_flags,
smp_cond_func_t cond_func)
{
- int cpu, last_cpu, this_cpu = smp_processor_id();
struct cpumask *cpumask, *task_mask;
bool wait = scf_flags & SCF_WAIT;
struct call_function_data *cfd;
+ int cpu, last_cpu, this_cpu;
bool run_remote = false;
int nr_cpus = 0;

- lockdep_assert_preemption_disabled();
-
+ this_cpu = get_cpu();
cfd = this_cpu_ptr(&cfd_data);
task_mask = smp_task_ipi_mask(current);
if (task_mask)
@@ -952,6 +951,16 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
local_irq_restore(flags);
}

+ /*
+ * The IPI work has been queued and dispatched. On PREEMPT kernels,
+ * tasks created through dup_task_struct() have task-local wait masks.
+ * The boot init_task can fall back to cfd->cpumask when the mask is
+ * not inlined, but other tasks still use task-local masks and cannot
+ * overwrite it. On !PREEMPT kernels, preempt_enable() cannot schedule
+ * another task, so the per-CPU mask remains protected.
+ */
+ put_cpu();
+
if (run_remote && wait) {
for_each_cpu(cpu, cpumask) {
call_single_data_t *csd;
@@ -964,15 +973,14 @@ static void smp_call_function_many_cond(const struct cpumask *mask,

/**
* smp_call_function_many() - Run a function on a set of CPUs.
- * @mask: The set of cpus to run on (only runs on online subset).
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait (atomically) until function has completed
- * on other CPUs.
+ * @mask: The set of cpus to run on (only runs on online subset).
+ * @func: The function to run. This must be fast and non-blocking.
+ * @info: An arbitrary pointer to pass to the function.
+ * @wait: If true, wait (atomically) until function has completed
+ * on other CPUs.
*
* You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler. Preemption
- * must be disabled when calling this function.
+ * hardware interrupt handler or from a bottom half handler.
*
* @func is not called on the local CPU even if @mask contains it. Consider
* using on_each_cpu_cond_mask() instead if this is not desirable.
--
2.20.1