[tip: x86/mm] x86/mm/tlb: Avoid reading mm_tlb_gen when possible

From: tip-bot2 for Nadav Amit
Date: Tue Jun 07 2022 - 12:38:49 EST


The following commit has been merged into the x86/mm branch of tip:

Commit-ID: aa44284960d550eb4d8614afdffebc68a432a9b4
Gitweb: https://git.kernel.org/tip/aa44284960d550eb4d8614afdffebc68a432a9b4
Author: Nadav Amit <namit@xxxxxxxxxx>
AuthorDate: Mon, 06 Jun 2022 11:01:23 -07:00
Committer: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
CommitterDate: Tue, 07 Jun 2022 08:48:03 -07:00

x86/mm/tlb: Avoid reading mm_tlb_gen when possible

On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
contended and reading it should (arguably) be avoided as much as
possible.

Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
even when it is not necessary (e.g., the mm was already switched).
This is wasteful.

Moreover, one of the existing optimizations is to read mm's tlb_gen to
see if there are additional in-flight TLB invalidations and flush the
entire TLB in such a case. However, if the request's tlb_gen was already
flushed, the benefit of checking the mm's tlb_gen is likely to be offset
by the overhead of the check itself.

Running will-it-scale with tlb_flush1_threads show a considerable
benefit on 56-core Skylake (up to +24%):

threads Baseline (v5.17+) +Patch
1 159960 160202
5 310808 308378 (-0.7%)
10 479110 490728
15 526771 562528
20 534495 587316
25 547462 628296
30 579616 666313
35 594134 701814
40 612288 732967
45 617517 749727
50 637476 735497
55 614363 778913 (+24%)

Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Acked-by: Andy Lutomirski <luto@xxxxxxxxxx>
Link: https://lkml.kernel.org/r/20220606180123.2485171-1-namit@xxxxxxxxxx
---
arch/x86/mm/tlb.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d400b6d..d9314cc 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
const struct flush_tlb_info *f = info;
struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
- u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
bool local = smp_processor_id() == f->initiating_cpu;
unsigned long nr_invalidate = 0;
+ u64 mm_tlb_gen;

/* This code cannot presently handle being reentered. */
VM_WARN_ON(!irqs_disabled());
@@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
return;
}

+ if (f->new_tlb_gen <= local_tlb_gen) {
+ /*
+ * The TLB is already up to date in respect to f->new_tlb_gen.
+ * While the core might be still behind mm_tlb_gen, checking
+ * mm_tlb_gen unnecessarily would have negative caching effects
+ * so avoid it.
+ */
+ return;
+ }
+
+ /*
+ * Defer mm_tlb_gen reading as long as possible to avoid cache
+ * contention.
+ */
+ mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
+
if (unlikely(local_tlb_gen == mm_tlb_gen)) {
/*
* There's nothing to do: we're already up to date. This can