Re: [PATCH v14 05/13] x86/mm: use INVLPGB in flush_tlb_all
From: Borislav Petkov
Date: Sat Mar 01 2025 - 07:21:00 EST
On Fri, Feb 28, 2025 at 11:18:04AM -0800, Dave Hansen wrote:
> We haven't talked at all about the locking rules for
> invlpgb_flush_all(). It was used once in this series without any
> explicit preempt twiddling. I assume that was because it was used in a
> path where preempt is disabled.
>
> If it does need a universal rule about preempt, can we please add an:
>
> lockdep_assert_preemption_disabled()
>
> along with a comment about why it needs preempt disabled?
So, after talking on IRC last night, below is what I think we should do ontop.
More specifically:
- I've pushed the preemption guard inside the functions which do
INVLPGB+TLBSYNC so that callers do not have to think about it.
- invlpgb_kernel_range_flush() I still don't like and we have to rely there on
cant_migrate() in __tlbsync() - I'd like for all of them to be nicely packed
but don't have an idea yet how to do that cleanly...
- document what means for bits rax[0:2] being clear when issuing INVLPGB
That ok?
Anything I've missed?
If not, I'll integrate this into the patches.
Thx.
diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
index 45d9c7687d61..0d90ceeb472b 100644
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -39,6 +39,10 @@ static inline void invlpg(unsigned long addr)
* the first page, while __invlpgb gets the more human readable number of
* pages to invalidate.
*
+ * The bits in rax[0:2] determine respectively which components of the address
+ * (VA, PCID, ASID) get compared when flushing. If neither bits are set, *any*
+ * address in the specified range matches.
+ *
* TLBSYNC is used to ensure that pending INVLPGB invalidations initiated from
* this CPU have completed.
*/
@@ -60,10 +64,10 @@ static inline void __invlpgb(unsigned long asid, unsigned long pcid,
static inline void __tlbsync(void)
{
/*
- * tlbsync waits for invlpgb instructions originating on the
- * same CPU to have completed. Print a warning if we could have
- * migrated, and might not be waiting on all the invlpgbs issued
- * during this TLB invalidation sequence.
+ * TLBSYNC waits for INVLPGB instructions originating on the same CPU
+ * to have completed. Print a warning if the task has been migrated,
+ * and might not be waiting on all the INVLPGBs issued during this TLB
+ * invalidation sequence.
*/
cant_migrate();
@@ -106,6 +110,13 @@ static inline void invlpgb_flush_single_pcid_nosync(unsigned long pcid)
/* Flush all mappings, including globals, for all PCIDs. */
static inline void invlpgb_flush_all(void)
{
+ /*
+ * TLBSYNC at the end needs to make sure all flushes done on the
+ * current CPU have been executed system-wide. Therefore, make
+ * sure nothing gets migrated in-between but disable preemption
+ * as it is cheaper.
+ */
+ guard(preempt)();
__invlpgb(0, 0, 0, 1, 0, INVLPGB_INCLUDE_GLOBAL);
__tlbsync();
}
@@ -119,10 +130,7 @@ static inline void invlpgb_flush_addr_nosync(unsigned long addr, u16 nr)
/* Flush all mappings for all PCIDs except globals. */
static inline void invlpgb_flush_all_nonglobals(void)
{
- /*
- * @addr=0 means both rax[1] (valid PCID) and rax[2] (valid ASID) are clear
- * so flush *any* PCID and ASID.
- */
+ guard(preempt)();
__invlpgb(0, 0, 0, 1, 0, 0);
__tlbsync();
}
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index f49627e02311..8cd084bc3d98 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1075,19 +1075,11 @@ void flush_tlb_all(void)
count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
/* First try (faster) hardware-assisted TLB invalidation. */
- if (cpu_feature_enabled(X86_FEATURE_INVLPGB)) {
- /*
- * TLBSYNC at the end needs to make sure all flushes done
- * on the current CPU have been executed system-wide.
- * Therefore, make sure nothing gets migrated
- * in-between but disable preemption as it is cheaper.
- */
- guard(preempt)();
+ if (cpu_feature_enabled(X86_FEATURE_INVLPGB))
invlpgb_flush_all();
- } else {
+ else
/* Fall back to the IPI-based invalidation. */
on_each_cpu(do_flush_tlb_all, NULL, 1);
- }
}
/* Flush an arbitrarily large range of memory with INVLPGB. */
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette