Re: [PATCH v3 05/11] x86/mm: Track the TLB's tlb_gen and update the flushing algorithm

From: Borislav Petkov
Date: Thu Jun 22 2017 - 10:59:34 EST


On Thu, Jun 22, 2017 at 07:48:21AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 22, 2017 at 12:24 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> > On Wed, Jun 21, 2017 at 07:46:05PM -0700, Andy Lutomirski wrote:
> >> > I'm certainly still missing something here:
> >> >
> >> > We have f->new_tlb_gen and mm_tlb_gen to control the flushing, i.e., we
> >> > do once
> >> >
> >> > bump_mm_tlb_gen(mm);
> >> >
> >> > and once
> >> >
> >> > info.new_tlb_gen = bump_mm_tlb_gen(mm);
> >> >
> >> > and in both cases, the bumping is done on mm->context.tlb_gen.
> >> >
> >> > So why isn't that enough to do the flushing and we have to consult
> >> > info.new_tlb_gen too?
> >>
> >> The issue is a possible race. Suppose we start at tlb_gen == 1 and
> >> then two concurrent flushes happen. The first flush is a full flush
> >> and sets tlb_gen to 2. The second is a partial flush and sets tlb_gen
> >> to 3. If the second flush gets propagated to a given CPU first and it
> >
> > Maybe I'm still missing something, which is likely...
> >
> > but if the second flush gets propagated to the CPU first, the CPU will
> > have local tlb_gen 1 and thus enforce a full flush anyway because we
> > will go 1 -> 3 on that particular CPU. Or?
> >
>
> Yes, exactly. Which means I'm probably just misunderstanding your
> original question. Can you re-ask it?

Ah, simple: we control the flushing with info.new_tlb_gen and
mm->context.tlb_gen. I.e., this check:


if (f->end != TLB_FLUSH_ALL &&
f->new_tlb_gen == local_tlb_gen + 1 &&
f->new_tlb_gen == mm_tlb_gen) {

why can't we write:

if (f->end != TLB_FLUSH_ALL &&
mm_tlb_gen == local_tlb_gen + 1)

?

If mm_tlb_gen is + 2, then we'll do a full flush, if it is + 1, then
partial.

If the second flush, as you say is a partial one and still gets
propagated first, the check will force a full flush anyway.

When the first flush propagates after the second, we'll ignore it
because local_tlb_gen has advanced adready due to the second flush.

As a matter of fact, we could simplify the logic: if local_tlb_gen is
only mm_tlb_gen - 1, then do the requested flush type.

If mm_tlb_gen has advanced more than 1 generation, just do a full flush
unconditionally. ... and I think we do something like that already but I
think the logic could be simplified, unless I'm missing something, that is.

Thanks.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.