Re: [PATCH RESEND v3 1/2] mm/tlb: skip redundant IPI when TLB flush already synchronized
From: David Hildenbrand (Red Hat)
Date: Fri Jan 09 2026 - 10:44:32 EST
On 1/9/26 16:30, Lance Yang wrote:
On 2026/1/9 22:13, David Hildenbrand (Red Hat) wrote:
What could work is tracking "tlb_table_flush_sent_ipi" really when we
are flushing the TLB for removed/unshared tables, and maybe resetting
it ... I don't know when from the top of my head.
Not sure what's the best way forward here :(
v2 was simpler IMHO.
The main concern Dave raised was that with PV hypercalls or when
INVLPGB is available, we can't tell from a static check whether IPIs
were actually sent.
Why can't we set the boolean at runtime when initializing the pv_ops
structure, when we are sure that it is allowed?
Yes, thanks, that sounds like a reasonable trade-off :)
As you mentioned:
"this lifetime stuff in core-mm ends up getting more complicated than
v2 without a clear benefit".
I totally agree that v3 is too complicated :(
But Dave's concern about v2 was that we can't accurately tell whether
IPIs were actually sent in PV environments or with INVLPGB, which
misses optimization opportunities. The INVLPGB+no_global_asid case
also sends IPIs during TLB flush.
Anyway, yeah, I'd rather start with a simple approach, even if it's
not perfect. We can always improve it later ;)
Any ideas on how to move forward?
I'd hope Dave can comment :)
In general, I saw the whole thing as a two step process:
1) Avoid IPIs completely when the TLB flush sent them. We can achieve
that through v2 or v3, one-way or the other, I don't particularly
care as long as it is clean and simple.
2) For other configs/arch, send IPIs only to CPUs that are actually in
GUP-fast etc. That would resolve some RT headake with broadcast IPIs.
Regarding 2), it obviously only applies to setups where 1) does not
apply: like x86 with INVLPGB or arm64.
I once had the idea of letting CPUs that enter/exit GUP-fast (and
similar) to indicate in a global cpumask (or per-CPU variables) that
they are in that context. Then, we can just collect these CPUs and limit
the IPIs to them (usually, not a lot ...).
The trick here is to not slowdown GUP-fast too much. And one person
(Yair in RT context) who played with that was not able to reduce the
overhead sufficiently enough.
I guess the options are
a) Per-MM CPU mask we have to update atomically when entering/leaving
GUP-fast
b) Global mask we have to update atomically when entering/leaving GUP-fast
c) Per-CPU variable we have to update when entering-leaving GUP-fast.
Interrupts are disabled, so we don't have to worry about reschedule etc.
Maybe someone reading along has other thoughts.
--
Cheers
David