When VMAP_STACK is enabled, the kernel stack will be obtained through
vmalloc(). Normally, we rely on the logic in vmalloc_fault() to update stale
P*D entries covering the vmalloc space in a task's page tables when it first
accesses the problematic region.
Unfortunately, this is not sufficient when
the kernel stack resides in the vmalloc region, because vmalloc_fault() is a
C function that needs a stack to run. So we need to ensure that these P*D
entries are up to date *before* the MM switch.
Here's our symptom:
core 0: A speculative load lead the kernel stack load to the TLB before the
corresponding kernel stack's page table is created.
core 1: Create page table mapping of that kernel stack.
core 0: After a context switch, the kernel attempts to use the stack region.
However, even if the page table is correct, the stack address mapping
in the TLB is invalid, leading to subsequent nested exceptions.
This fix is inspired by ARM's approach[*1], commit a1c510d0adc6 ("ARM:
implement support for vmap'ed stacks"), it also performs a TLB flush after
setting up the page tables in vmalloc().
Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
Signed-off-by: Dylan Jhong <dylan@xxxxxxxxxxxxx>
---
arch/riscv/include/asm/page.h | 4 ++++
arch/riscv/mm/tlbflush.c | 16 ++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 349fad5e35de..c9b080a72855 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -21,6 +21,10 @@
#define HPAGE_MASK (~(HPAGE_SIZE - 1))
#define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
+#ifdef CONFIG_VMAP_STACK
+#define ARCH_PAGE_TABLE_SYNC_MASK PGTBL_PTE_MODIFIED
+#endif
+
/*
* PAGE_OFFSET -- the first address of the first page of memory.
* When not using MMU this corresponds to the first free page in
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index ef701fa83f36..0799978913ee 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -86,3 +86,19 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
__sbi_tlb_flush_range(vma->vm_mm, start, end - start, PMD_SIZE);
}
#endif
+
+#ifdef CONFIG_VMAP_STACK
+/*
+ * Normally, we rely on the logic in vmalloc_fault() to update stale P*D
+ * entries covering the vmalloc space in a task's page tables when it first
+ * accesses the problematic region. Unfortunately, this is not sufficient when
+ * the kernel stack resides in the vmalloc region, because vmalloc_fault() is a
+ * C function that needs a stack to run. So we need to ensure that these P*D
+ * entries are up to date *before* the MM switch.
+ */
+void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
+{
+ if (start < VMALLOC_END && end > VMALLOC_START)
+ flush_tlb_all();
+}
+#endif