[REF PATCH] x86/tlb: just do tlb flush on one of siblings of SMT
From: Alex Shi
Date: Tue Apr 05 2016 - 23:17:51 EST
It seems Intel core still share the TLB pool, flush both of threads' TLB
just cause a extra useless IPI and a extra flush. The extra flush will
flush out TLB again which another thread just introduced.
That's double waste.
The micro testing show memory access can save about 25% time on my
haswell i7 desktop.
munmap source code is here: https://lkml.org/lkml/2012/5/17/59
test result on Kernel v4.5.0:
$/home/alexs/bin/perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses,dTLB-stores,iTLB-load-misses,iTLB-loads -e tlb:tlb_flush munmap -n 64 -t 16
munmap use 57ms 14072ns/time, memory access uses 48356 times/thread/ms, cost 20ns/time
Performance counter stats for '/home/alexs/backups/exec-laptop/tlb/munmap -n 64 -t 16':
18,739,808 dTLB-load-misses # 2.47% of all dTLB cache hits (43.05%)
757,380,911 dTLB-loads (34.34%)
2,125,275 dTLB-store-misses (32.23%)
318,307,759 dTLB-stores (46.32%)
32,765 iTLB-load-misses # 2.03% of all iTLB cache hits (56.90%)
1,616,237 iTLB-loads (44.47%)
41,476 tlb:tlb_flush
1.443484546 seconds time elapsed
/proc/vmstat/nr_tlb_remote_flush increased: 4616
/proc/vmstat/nr_tlb_remote_flush_received increased: 32262
test result on Kernel v4.5.0 + this patch:
$/home/alexs/bin/perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses,dTLB-stores,iTLB-load-misses,iTLB-loads -e tlb:tlb_flush munmap -n 64 -t 16
munmap use 48ms 11933ns/time, memory access uses 59966 times/thread/ms, cost 16ns/time
Performance counter stats for '/home/alexs/backups/exec-laptop/tlb/munmap -n 64 -t 16':
15,984,772 dTLB-load-misses # 1.89% of all dTLB cache hits (41.72%)
844,099,241 dTLB-loads (33.30%)
1,328,102 dTLB-store-misses (52.13%)
280,902,875 dTLB-stores (52.03%)
27,678 iTLB-load-misses # 1.67% of all iTLB cache hits (35.35%)
1,659,550 iTLB-loads (38.38%)
25,137 tlb:tlb_flush
1.428880301 seconds time elapsed
/proc/vmstat/nr_tlb_remote_flush increased: 4616
/proc/vmstat/nr_tlb_remote_flush_received increased: 15912
BTW,
This change isn't architecturally guaranteed.
Signed-off-by: Alex Shi <alex.shi@xxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
To: linux-kernel@xxxxxxxxxxxxxxx
To: Mel Gorman <mgorman@xxxxxxx>
To: x86@xxxxxxxxxx
To: "H. Peter Anvin" <hpa@xxxxxxxxx>
To: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Alex Shi <alex.shi@xxxxxxxxxx>
---
arch/x86/mm/tlb.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 8f4cc3d..6510316 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -134,7 +134,10 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm, unsigned long start,
unsigned long end)
{
+ int cpu;
struct flush_tlb_info info;
+ cpumask_t flush_mask, *sblmask;
+
info.flush_mm = mm;
info.flush_start = start;
info.flush_end = end;
@@ -151,7 +154,23 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
&info, 1);
return;
}
- smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
+
+ if (unlikely(smp_num_siblings <= 1)) {
+ smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
+ return;
+ }
+
+ /* Only one flush needed on both siblings of SMT */
+ cpumask_copy(&flush_mask, cpumask);
+ for_each_cpu(cpu, &flush_mask) {
+ sblmask = topology_sibling_cpumask(cpu);
+ if (!cpumask_subset(sblmask, &flush_mask))
+ continue;
+
+ cpumask_clear_cpu(cpumask_next(cpu, sblmask), &flush_mask);
+ }
+
+ smp_call_function_many(&flush_mask, flush_tlb_func, &info, 1);
}
void flush_tlb_current_task(void)
--
2.7.2.333.g70bd996