Re: [PATCH v2 1/1] mm/mmu_gather: replace IPI with synchronize_rcu() when batch allocation fails
From: Lance Yang
Date: Tue Feb 24 2026 - 07:57:57 EST
On 2026/2/24 20:35, Peter Zijlstra wrote:
On Tue, Feb 24, 2026 at 08:18:46PM +0800, Lance Yang wrote:
On 2026/2/24 19:41, Peter Zijlstra wrote:
On Tue, Feb 24, 2026 at 11:07:00AM +0800, Lance Yang wrote:
From: Lance Yang <lance.yang@xxxxxxxxx>
When freeing page tables, we try to batch them. If batch allocation fails
(GFP_NOWAIT), __tlb_remove_table_one() immediately frees the one without
batching.
On !CONFIG_PT_RECLAIM, the fallback sends an IPI to all CPUs via
tlb_remove_table_sync_one(). It disrupts all CPUs even when only a single
process is unmapping memory. IPI broadcast was reported to hurt RT
workloads[1].
tlb_remove_table_sync_one() synchronizes with lockless page-table walkers
(e.g. GUP-fast) that rely on IRQ disabling. These walkers use
local_irq_disable(), which is also an RCU read-side critical section.
This patch introduces tlb_remove_table_sync_rcu() which uses RCU grace
period (synchronize_rcu()) instead of IPI broadcast. This provides the
same guarantee as IPI but without disrupting all CPUs. Since batch
allocation already failed, we are in a way slow path where sleeping is
acceptable - we are in process context (unmap_region, exit_mmap) with only
mmap_lock held. might_sleep() will catch any invalid context.
So sending the IPIs also requires non-atomic context, so change there.
Yeah, you're right!
What isn't explained, and very much not clear to me, is why
tlb_remove_table_sync_one() is retained?
Good point. tlb_remove_table_sync_one() is still needed in:
1) khugepaged (mm/khugepaged.c) - after pmdp_collapse_flush()
2) tlb_finish_mmu() (tlb.h) - when tlb->fully_unshared_tables
3) ...
These are not slow paths like batch allocation failure. This patch only
converts this obvious slow path first.
I'm working on converting the remaining callers as well, but not with
RCU, looking at other options (e.g. targeted IPI).
OK, so with that addition to the Changelog,
Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Thanks for taking time to review!