Re: [PATCH v6 11/12] x86/mm: Enable preemption during native_flush_tlb_multi

From: Dave Hansen

Date: Thu Jun 04 2026 - 17:15:36 EST

First, the subject needs some improvement. Add parenthesis to
functions(), please. Second, it's literally wrong: "Enable preemption
during native_flush_tlb_multi". It does not do that or it at least
describes it badly.

It enables preemption during *one* call to native_flush_tlb_multi().

> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 29226d112029..d540f54f4d16 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -662,8 +662,10 @@ static void kvm_flush_tlb_multi(const struct cpumask *cpumask,
> u8 state;
> int cpu;
> struct kvm_steal_time *src;
> - struct cpumask *flushmask = this_cpu_cpumask_var_ptr(__pv_cpu_mask);
> + struct cpumask *flushmask;
>
> + guard(preempt)();
> + flushmask = this_cpu_cpumask_var_ptr(__pv_cpu_mask);
> cpumask_copy(flushmask, cpumask);
> /*
> * We have to call flush only on online vCPUs. And

This KVM modification is a complete non sequitur. It comes from nowhere.
No comments. No mention in the changelog.

Now, looking at how it's called, I guess flush_tlb_multi() lands here
because of pv_ops. But, please have mercy on the poor reviewers and walk
them through this.

This could also be done in a separate patch. It's OK to disable
preemption twice.

> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index cfc3a72477f5..58c6f3d2f993 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -1421,9 +1421,11 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> if (mm_global_asid(mm)) {
> broadcast_tlb_flush(info);
> } else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
> + put_cpu();
> info->trim_cpumask = should_trim_cpumask(mm);
> flush_tlb_multi(mm_cpumask(mm), info);
> consider_global_asid(mm);
> + goto invalidate;
> } else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
> lockdep_assert_irqs_enabled();
> local_irq_disable();
> @@ -1432,6 +1434,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> }
>
> put_cpu();
> +invalidate:
> mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
> }

I really don't like the goto. Can this be refactored to not use a goto?

I'd honestly rather it be:

if (foo) {
broadcast_tlb_flush(info);
put_cpu();
} else if (bar) {
put_cpu();
flush_tlb_multi();
} else {
flush_tlb_func(info);
put_cpu();
}

than have the goto. At least that ^ makes it obvious that each case
needs a put_cpu(). But I also just generally don't like how the code is
structured at this point.

Does anybody have any smart ideas?

> @@ -1691,7 +1694,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
> invlpgb_flush_all_nonglobals();
> batch->unmapped_pages = false;
> } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
> + put_cpu();
> flush_tlb_multi(&batch->cpumask, &info);
> + goto clear;
> } else if (cpumask_test_cpu(cpu, &batch->cpumask)) {
> lockdep_assert_irqs_enabled();
> local_irq_disable();
> @@ -1699,9 +1704,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
> local_irq_enable();
> }
>
> - cpumask_clear(&batch->cpumask);
> -
> put_cpu();
> +clear:
> + cpumask_clear(&batch->cpumask);
> }

I have the same general complaint about this one. This is really just
hacked into place, leaving a mess for everyone in the future. It needs
some careful refactoring.