It's necessary to track lockless pagetable walks, in order to avoid doing
THP splitting/collapsing during them.
The default solution is to disable irq before lockless pagetable walks and
enable it after it's finished.
On code, this means you can find local_irq_disable() and local_irq_enable()
around some pieces of code, usually without comments on why it is needed.
This patch proposes a set of generic functions to be called before starting
and after finishing a lockless pagetable walk. It is supposed to make clear
that a lockless pagetable walk happens there, and also carries information
on why the irq disable/enable is needed.
begin_lockless_pgtbl_walk()
Insert before starting any lockless pgtable walk
end_lockless_pgtbl_walk()
Insert after the end of any lockless pgtable walk
(Mostly after the ptep is last used)
A memory barrier was also added just to make sure there is no speculative
read outside the interrupt disabled area. Other than that, it is not
supposed to have any change of behavior from current code.
It is planned to allow arch-specific versions, so that additional steps can
be added while keeping the code clean.
Signed-off-by: Leonardo Bras <leonardo@xxxxxxxxxxxxx>
---
include/asm-generic/pgtable.h | 51 +++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index e2e2bef07dd2..8d368d3c0974 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1222,6 +1222,57 @@ static inline bool arch_has_pfn_modify_check(void)
#endif
#endif
+#ifndef __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL
+/*
+ * begin_lockless_pgtbl_walk: Must be inserted before a function call that does
+ * lockless pagetable walks, such as __find_linux_pte()
+ */
+static inline
+unsigned long begin_lockless_pgtbl_walk(void)
+{
+ unsigned long irq_mask;
+
+ /*
+ * Interrupts must be disabled during the lockless page table walk.
+ * That's because the deleting or splitting involves flushing TLBs,
+ * which in turn issues interrupts, that will block when disabled.
+ */
+ local_irq_save(irq_mask);
+
+ /*
+ * This memory barrier pairs with any code that is either trying to
+ * delete page tables, or split huge pages. Without this barrier,
+ * the page tables could be read speculatively outside of interrupt
+ * disabling.
+ */
+ smp_mb();
+
+ return irq_mask;
+}
+
+/*
+ * end_lockless_pgtbl_walk: Must be inserted after the last use of a pointer
+ * returned by a lockless pagetable walk, such as __find_linux_pte()
+ */
+static inline void end_lockless_pgtbl_walk(unsigned long irq_mask)
+{
+ /*
+ * This memory barrier pairs with any code that is either trying to
+ * delete page tables, or split huge pages. Without this barrier,
+ * the page tables could be read speculatively outside of interrupt
+ * disabling.
+ */
+ smp_mb();
+
+ /*
+ * Interrupts must be disabled during the lockless page table walk.
+ * That's because the deleting or splitting involves flushing TLBs,
+ * which in turn issues interrupts, that will block when disabled.
+ */
+ local_irq_restore(irq_mask);
+}
+#endif
+
/*
* On some architectures it depends on the mm if the p4d/pud or pmd
* layer of the page table hierarchy is folded or not.