Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm

From: Andy Lutomirski
Date: Fri Jun 01 2018 - 23:36:21 EST


On Fri, Jun 1, 2018 at 3:13 PM Rik van Riel <riel@xxxxxxxxxxx> wrote:
>
> On Fri, 1 Jun 2018 14:21:58 -0700
> Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> > Hmm. I wonder if there's a more clever data structure than a bitmap
> > that we could be using here. Each CPU only ever needs to be in one
> > mm's cpumask, and each cpu only ever changes its own state in the
> > bitmask. And writes are much less common than reads for most
> > workloads.
>
> It would be easy enough to add an mm_struct pointer to the
> per-cpu tlbstate struct, and iterate over those.
>
> However, that would be an orthogonal change to optimizing
> lazy TLB mode.
>
> Does the (untested) patch below make sense as a potential
> improvement to the lazy TLB heuristic?
>
> ---8<---
> Subject: x86,tlb: workload dependent per CPU lazy TLB switch
>
> Lazy TLB mode is a tradeoff between flushing the TLB and touching
> the mm_cpumask(&init_mm) at context switch time, versus potentially
> incurring a remote TLB flush IPI while in lazy TLB mode.
>
> Whether this pays off is likely to be workload dependent more than
> anything else. However, the current heuristic keys off hardware type.
>
> This patch changes the lazy TLB mode heuristic to a dynamic, per-CPU
> decision, dependent on whether we recently received a remote TLB
> shootdown while in lazy TLB mode.
>
> This is a very simple heuristic. When a CPU receives a remote TLB
> shootdown IPI while in lazy TLB mode, a counter in the same cache
> line is set to 16. Every time we skip lazy TLB mode, the counter
> is decremented.
>
> While the counter is zero (no recent TLB flush IPIs), allow lazy TLB mode.

Hmm, cute. That's not a bad idea at all. It would be nice to get
some kind of real benchmark on both PCID and !PCID. If nothing else,
I would expect the threshold (16 in your patch) to want to be lower on
PCID systems.

--Andy