Re: [PATCH 05/11] smp: Enable preemption early in smp_call_function_many_cond

From: Chuyi Zhou

Date: Fri Feb 06 2026 - 03:45:34 EST

Hi Peter,

在 2026/2/5 22:59, Peter Zijlstra 写道:
> On Thu, Feb 05, 2026 at 10:29:51PM +0800, Chuyi Zhou wrote:
>> Hi Peter,
>>
>> 在 2026/2/5 18:57, Peter Zijlstra 写道:
>>> On Thu, Feb 05, 2026 at 10:52:36AM +0100, Peter Zijlstra wrote:
>>>> On Tue, Feb 03, 2026 at 07:23:55PM +0800, Chuyi Zhou wrote:
>>>>
>>>>> + /*
>>>>> + * Prevent the current CPU from going offline.
>>>>> + * Being migrated to another CPU and calling csd_lock_wait() may cause
>>>>> + * UAF due to smpcfd_dead_cpu() during the current CPU offline process.
>>>>> + */
>>>>> + migrate_disable();
>>>>
>>>> This is horrible crap. migrate_disable() is *NOT* supposed to be used to
>>>> serialize cpu hotplug.
>>>
>>> This was too complicated or something?
>>>
>>
>> Now most callers of smp_call*() explicitly use preempt_disable(). IIUC,
>> if we want to use cpus_read_lock(), we first need to clean up all these
>> preempt_disable() calls.
>>
>> Maybe a stupid question: Why can't migrate_disable prevent CPU removal?
>
> It can, but migrate_disable() is horrible, it should not be used if at
> all possible.

As you pointed out, using cpus_read_lock() is the simplest approach, and
indeed, that was the first solution we considered.

However, 99% of callers have preemption disabled, some of them even
invoking it within spin_locks (for example, we might trigger a TLB flush
while holding pte spinlocks).

It's difficult for us to eliminate all these preempt_disable(),
especially for callers that disable preemption for other purposes,
making the use of cpus_read_lock almost impossible.

In our production environment, we observed that the overhead of
csd_lock_wait can be as high as several milliseconds, and in extreme
cases, even exceed 10ms+. Generally speaking, the time spent on
csd_lock_wait far exceeds the overhead of sending the IPI.

Disabling preemption for the entire duration would obviously affect the
preemption latency of high-priority tasks, which is unacceptable. This
optimization primarily targets PREEMPT, although PREEMPT_RT can also
benefit from it.

Compared to the cost of disabling preemption entirely, maybe using
migrate_disable() here seems to be an acceptable trade-off.