Re: [PATCH v6 11/12] x86/mm: Enable preemption during native_flush_tlb_multi
From: Chuyi Zhou
Date: Thu Jun 04 2026 - 23:37:10 EST
On 2026-06-05 5:15 a.m., Dave Hansen wrote:
> First, the subject needs some improvement. Add parenthesis to
> functions(), please. Second, it's literally wrong: "Enable preemption
> during native_flush_tlb_multi". It does not do that or it at least
> describes it badly.
>
> It enables preemption during *one* call to native_flush_tlb_multi().
>
Yes, the subject is misleading. Would the following subject work better?
x86/mm: Re-enable preemption before flush_tlb_multi()
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index 29226d112029..d540f54f4d16 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -662,8 +662,10 @@ static void kvm_flush_tlb_multi(const struct cpumask *cpumask,
>> u8 state;
>> int cpu;
>> struct kvm_steal_time *src;
>> - struct cpumask *flushmask = this_cpu_cpumask_var_ptr(__pv_cpu_mask);
>> + struct cpumask *flushmask;
>>
>> + guard(preempt)();
>> + flushmask = this_cpu_cpumask_var_ptr(__pv_cpu_mask);
>> cpumask_copy(flushmask, cpumask);
>> /*
>> * We have to call flush only on online vCPUs. And
>
> This KVM modification is a complete non sequitur. It comes from nowhere.
> No comments. No mention in the changelog.
>
> Now, looking at how it's called, I guess flush_tlb_multi() lands here
> because of pv_ops. But, please have mercy on the poor reviewers and walk
> them through this.
>
> This could also be done in a separate patch. It's OK to disable
> preemption twice.
>
I will move it into a preparatory patch which disables preemption in
kvm_flush_tlb_multi() while it uses the per-cpu __pv_cpu_mask scratch
cpumask. The changelog will explain that flush_tlb_multi() may reach
kvm_flush_tlb_multi() through pv_ops, so KVM should protect its own
per-cpu storage before the x86/mm callers stop guaranteeing
preemption-disabled context around flush_tlb_multi().
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index cfc3a72477f5..58c6f3d2f993 100644
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>> @@ -1421,9 +1421,11 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>> if (mm_global_asid(mm)) {
>> broadcast_tlb_flush(info);
>> } else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
>> + put_cpu();
>> info->trim_cpumask = should_trim_cpumask(mm);
>> flush_tlb_multi(mm_cpumask(mm), info);
>> consider_global_asid(mm);
>> + goto invalidate;
>> } else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
>> lockdep_assert_irqs_enabled();
>> local_irq_disable();
>> @@ -1432,6 +1434,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>> }
>>
>> put_cpu();
>> +invalidate:
>> mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
>> }
>
> I really don't like the goto. Can this be refactored to not use a goto?
>
> I'd honestly rather it be:
>
> if (foo) {
> broadcast_tlb_flush(info);
> put_cpu();
> } else if (bar) {
> put_cpu();
> flush_tlb_multi();
> } else {
> flush_tlb_func(info);
> put_cpu();
> }
>
> than have the goto. At least that ^ makes it obvious that each case
> needs a put_cpu(). But I also just generally don't like how the code is
> structured at this point.
>
> Does anybody have any smart ideas?
would the following structure look better to you?
bool remote_flush = false;
int cpu = get_cpu();
if (mm_global_asid(mm)) {
broadcast_tlb_flush(info);
} else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
remote_flush = true;
} else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
lockdep_assert_irqs_enabled();
local_irq_disable();
flush_tlb_func(info);
local_irq_enable();
}
put_cpu();
if (remote_flush) {
info->trim_cpumask = should_trim_cpumask(mm);
flush_tlb_multi(mm_cpumask(mm), info);
consider_global_asid(mm);
}
And similarly for arch_tlbbatch_flush():
bool remote_flush = false;
int cpu = get_cpu();
...
if (cpu_feature_enabled(X86_FEATURE_INVLPGB) && batch->unmapped_pages) {
invlpgb_flush_all_nonglobals();
batch->unmapped_pages = false;
} else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
remote_flush = true;
} else if (cpumask_test_cpu(cpu, &batch->cpumask)) {
lockdep_assert_irqs_enabled();
local_irq_disable();
flush_tlb_func(&info);
local_irq_enable();
}
put_cpu();
if (remote_flush)
flush_tlb_multi(&batch->cpumask, &info);
cpumask_clear(&batch->cpumask);