Re: [PATCH v2 0/3] skip redundant TLB sync IPIs

From: David Hildenbrand (Red Hat)

Date: Wed Dec 31 2025 - 07:33:39 EST


On 12/31/25 05:26, Dave Hansen wrote:
On 12/29/25 06:52, Lance Yang wrote:
...
This series introduces a way for architectures to indicate their TLB flush
already provides full synchronization, allowing the redundant IPI to be
skipped. For now, the optimization is implemented for x86 first and applied
to all page table operations that free or unshare tables.

I really don't like all the complexity here. Even on x86, there are
three or more ways of deriving this. Having the pv_ops check the value
of another pv op is also a bit unsettling.

Right. What I actually meant is that we simply have a property "bool flush_tlb_multi_implies_ipi_broadcast" that we set only to true from the initialization code.

Without comparing the pv_ops.

That should reduce the complexity quite a bit IMHO.

But maybe you have an even better way on how to indicate support, in a very simple way.


That said, complexity can be worth it with sufficient demonstrated
gains. But:

When unsharing hugetlb PMD page tables or collapsing pages in khugepaged,
we send two IPIs: one for TLB invalidation, and another to synchronize
with concurrent GUP-fast walkers.

Those aren't exactly hot paths. khugepaged is fundamentally rate
limited. I don't think unsharing hugetlb PMD page tables just is all
that common either.

Given that the added IPIs during unsharing broke Oracle DBs rather badly [1], I think this is actually a case worth optimizing.

I'd assume that the impact can be measured on a many-core/many-socket system with an adjusted reproducer of [1]. The impact will not be as big as what [1] fixed (we reduced the tlb_remove_table_sync_one() invocations quite drastically).

After all, tlb_remove_table_sync_one() sends an IPI to *all* CPUs in the system, not just the ones in the MM CPU mask, which is rather bad on systems with a lot of CPUs. Of course, this way we can only optimize on systems that actually send IPIs during TLB flushes.

For other systems, it will be more tricky to avoid these broadcast IPIs.

(I have the faint recollection that the IPI broadcast through tlb_remove_table_sync_one() is a problem when called from __tlb_remove_table_one() on RT systems ...)

[1] https://lkml.kernel.org/r/20251223214037.580860-1-david@xxxxxxxxxx

--
Cheers

David