Re: [PATCH v14 11/13] x86/mm: do targeted broadcast flushing from tlbbatch code
From: Borislav Petkov
Date: Tue Mar 04 2025 - 06:55:41 EST
On Mon, Mar 03, 2025 at 01:47:42PM -0800, Dave Hansen wrote:
> > +static inline void invlpgb_flush_addr_nosync(unsigned long addr, u16 nr)
> > +{
> > + __invlpgb_flush_addr_nosync(addr, nr);
> > + if (!cpu_need_tlbsync())
> > + cpu_set_tlbsync(true);
> > +}
>
> One thought on these:
>
> Instead of having three functions:
>
> 1. A raw __invlpgb_*_nosync()
> 2. A wrapper invlpgb_*_nosync() that flips cpu_set_tlbsync()
> 3. A wrapper invlpgb_*()
>
> Could we get away with just two? For instance, what if we had *ALL*
> __invlpgb()'s do cpu_set_tlbsync()? Then we'd universally call tlbsync().
>
> static inline void invlpgb_flush_all_nonglobals(void)
> {
> guard(preempt)();
> __invlpgb(0, 0, 0, 1, NO_STRIDE, INVLPGB_MODE_ALL_NONGLOBALS);
> tlbsync();
> }
>
> Then we wouldn't need any of those three new wrappers. The only downside
> is that we'd end up with paths that logically do:
>
> __invlpgb()
> cpu_set_tlbsync(true);
> if (cpu_need_tlbsync()) { // always true
> __tlbsync();
> cpu_set_tlbsync(true);
> }
>
> In other words, a possibly superfluous set/check/clear of the
> "need_tlbsync" state. But I'd expect that to be a pittance compared to
> the actual cost of INVLPGB/TLBSYNC.
Lemme see whether I can grasp properly what you mean:
What you really want is for the _nosync() variants to set need_tlbsync, right?
Because looking at all the call sites which do set tlbsync, the flow is this:
__invlpgb_flush_user_nr_nosync(pcid, addr, nr, pmd_stride, freed_tables);
if (!cpu_need_tlbsync())
cpu_set_tlbsync(true);
So we should move that
cpu_set_tlbsync(true);
into the __invlpgb_*_nosync() variant.
And in order to make it even simpler, we should drop the testing too:
IOW, this:
/* Flush all mappings for a given PCID, not including globals. */
static inline void __invlpgb_flush_single_pcid_nosync(unsigned long pcid)
{
__invlpgb(0, pcid, 0, 1, 0, INVLPGB_PCID);
cpu_set_tlbsync(true);
}
Right?
> > static void broadcast_tlb_flush(struct flush_tlb_info *info)
> > {
> > bool pmd = info->stride_shift == PMD_SHIFT;
> > @@ -790,6 +821,8 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
> > if (IS_ENABLED(CONFIG_PROVE_LOCKING))
> > WARN_ON_ONCE(!irqs_disabled());
> >
> > + tlbsync();
>
> This one is in dire need of comments.
Maybe this:
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 08672350536f..b97249ffff1f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -822,6 +822,9 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
if (IS_ENABLED(CONFIG_PROVE_LOCKING))
WARN_ON_ONCE(!irqs_disabled());
+ /*
+ * Finish any remote TLB flushes pending from this CPU:
+ */
tlbsync();
/*
> Ditto, *especially* if this hits the init_mm state. There really
> shouldn't be any deferred flushes for the init_mm.
Basically what you said but as a comment. :-P
Merged in the rest.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette