Re: [PATCH] x86/mm/tlb: avoid reading mm_tlb_gen when possible
From: Nadav Amit
Date: Mon Jun 06 2022 - 10:29:29 EST
On Mar 28, 2022, at 3:35 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, Mar 22, 2022 at 10:07:57PM +0000, Nadav Amit wrote:
>> From: Nadav Amit <namit@xxxxxxxxxx>
>>
>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
>> contended and reading it should (arguably) be avoided as much as
>> possible.
>>
>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
>> even when it is not necessary (e.g., the mm was already switched).
>> This is wasteful.
>>
>> Moreover, one of the existing optimizations is to read mm's tlb_gen to
>> see if there are additional in-flight TLB invalidations and flush the
>> entire TLB in such a case. However, if the request's tlb_gen was already
>> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
>> by the overhead of the check itself.
>>
>> Running will-it-scale with tlb_flush1_threads show a considerable
>> benefit on 56-core Skylake (up to +24%):
>>
>> threads Baseline (v5.17+) +Patch
>> 1 159960 160202
>> 5 310808 308378 (-0.7%)
>> 10 479110 490728
>> 15 526771 562528
>> 20 534495 587316
>> 25 547462 628296
>> 30 579616 666313
>> 35 594134 701814
>> 40 612288 732967
>> 45 617517 749727
>> 50 637476 735497
>> 55 614363 778913 (+24%)
>
> Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Ping?