Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier

From: Rik van Riel
Date: Wed Jul 18 2018 - 16:58:40 EST




> On Jul 17, 2018, at 4:04 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
>
> I think you've introduced a minor-ish performance regression due to
> changing the old (admittedly terribly documented) control flow a bit.
> Before, if real_prev == next, we would skip:
>
> load_mm_cr4(next);
> switch_ldt(real_prev, next);
>
> Now we don't any more. I think you should reinstate that
> optimization. It's probably as simple as wrapping them in an if
> (real_priv != next) with a comment like /* Remote changes that would
> require a cr4 or ldt reload will unconditionally send an IPI even to
> lazy CPUs. So, if we aren't changing our mm, we don't need to refresh
> cr4 or the ldt */

Looks like switch_ldt already skips reloading the LDT when prev equals
next, or when they simply have the same LDT values:

if (unlikely((unsigned long)prev->context.ldt |
(unsigned long)next->context.ldt))
load_mm_ldt(next);

It appears that the cr4 bits have a similar optimization:

static inline void cr4_set_bits(unsigned long mask)
{
unsigned long cr4, flags;

local_irq_save(flags);
cr4 = this_cpu_read(cpu_tlbstate.cr4);
if ((cr4 | mask) != cr4)
__cr4_set(cr4 | mask);
local_irq_restore(flags);
}

>
> Hmm. load_mm_cr4() should bypass itself when mm == &init_mm. Want to
> fix that part or should I?
>
Looks like there might not be anything to do here, after all.

On to the lazy TLB mm_struct refcounting stuff :)