Re: [PATCH 06/12] x86/mm: use INVLPGB for kernel TLB flushes

From: Rik van Riel
Date: Fri Jan 10 2025 - 00:32:15 EST


On Thu, 2025-01-09 at 13:18 -0800, Dave Hansen wrote:
>
> But actually I think INVLPGB is *WAY* better than INVLPG here. 
> INVLPG
> doesn't have ranged invalidation. It will only architecturally
> invalidate multiple 4K entries when the hardware fractured them in
> the
> first place. I think we should probably take advantage of what
> INVLPGB
> can do instead of following the INVLPG approach.
>
> INVLPGB will invalidate a range no matter where the underlying
> entries
> came from. Its "increment the virtual address at the 2M boundary"
> mode
> will invalidate entries of any size. That's my reading of the docs at
> least. Is that everyone else's reading too?

Ohhhh, good point! I glossed over that the first
half dozen times I was reading the document, because
I was trying to use the ASID, and working to figure
out why things kept crashing (turns out I can only
use the PCID on bare metal)

>
> So, let's pick a number "Z" which is >= invlpgb_count_max. Z could
> arguably be set to tlb_single_page_flush_ceiling. Then do this:
>
>    4k -> Z*4k => use 4k step
> >Z*4k -> Z*2M => use 2M step
> >Z*2M       => invalidate everything
>
> Invalidations <=Z*4k are exact. They never zap extra TLB entries.
>
> Invalidations that use the 2M step *might* unnecessarily zap some
> extra
> 4k mappings in the last 2M, but this is *WAY* better than
> invalidating
> everything.
>
This is a great idea.

Then the code in get_flush_tlb_info can adjust
start, end, and stride_shift as needed.

INVLPGB also supports invalidation of an entire
1GB region, so we can take your idea one step
further :)

With up to 8 pages zapped by a single INVLPGB
instruction, and multiple in flight simultaneously,
maybe we could set the threshold to 64, for 8
INVLPGBs in flight at once?

That way we can invalidate up to 1/8th of a
512 entry range with individual zaps, before
just zapping the higher level entry.

> "Invalidate everything" obviously stinks, but it should only be for
> pretty darn big invalidations.

That would only come into play when we get
past several GB worth of invalidation.

--
All Rights Reversed.