[PATCH v17 11/26] x86/mm: Update ptep_set_wrprotect() and pmdp_set_wrprotect() for transition from _PAGE_DIRTY to _PAGE_COW

From: Yu-cheng Yu
Date: Tue Dec 29 2020 - 16:33:34 EST


When Shadow Stack is introduced, [R/O + _PAGE_DIRTY] PTE is reserved for
shadow stack. Copy-on-write PTEs have [R/O + _PAGE_COW].

When a PTE goes from [R/W + _PAGE_DIRTY] to [R/O + _PAGE_COW], it could
become a transient shadow stack PTE in two cases:

The first case is that some processors can start a write but end up seeing
a read-only PTE by the time they get to the Dirty bit, creating a transient
shadow stack PTE. However, this will not occur on processors supporting
Shadow Stack, therefore we don't need a TLB flush here.

The second case is that when the software, without atomic, tests & replaces
_PAGE_DIRTY with _PAGE_COW, a transient shadow stack PTE can exist.
This is prevented with cmpxchg.

Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many
insights to the issue. Jann Horn provided the cmpxchg solution.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx>
Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>
---
arch/x86/include/asm/pgtable.h | 52 ++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 666c25ab9564..1c84f1ba32b9 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1226,6 +1226,32 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
static inline void ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
+ /*
+ * Some processors can start a write, but end up seeing a read-only
+ * PTE by the time they get to the Dirty bit. In this case, they
+ * will set the Dirty bit, leaving a read-only, Dirty PTE which
+ * looks like a shadow stack PTE.
+ *
+ * However, this behavior has been improved and will not occur on
+ * processors supporting Shadow Stack. Without this guarantee, a
+ * transition to a non-present PTE and flush the TLB would be
+ * needed.
+ *
+ * When changing a writable PTE to read-only and if the PTE has
+ * _PAGE_DIRTY set, move that bit to _PAGE_COW so that the PTE is
+ * not a shadow stack PTE.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+ pte_t old_pte, new_pte;
+
+ do {
+ old_pte = READ_ONCE(*ptep);
+ new_pte = pte_wrprotect(old_pte);
+
+ } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte));
+
+ return;
+ }
clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
}

@@ -1282,6 +1308,32 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
static inline void pmdp_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
{
+ /*
+ * Some processors can start a write, but end up seeing a read-only
+ * PMD by the time they get to the Dirty bit. In this case, they
+ * will set the Dirty bit, leaving a read-only, Dirty PMD which
+ * looks like a Shadow Stack PMD.
+ *
+ * However, this behavior has been improved and will not occur on
+ * processors supporting Shadow Stack. Without this guarantee, a
+ * transition to a non-present PMD and flush the TLB would be
+ * needed.
+ *
+ * When changing a writable PMD to read-only and if the PMD has
+ * _PAGE_DIRTY set, move that bit to _PAGE_COW so that the PMD is
+ * not a shadow stack PMD.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+ pmd_t old_pmd, new_pmd;
+
+ do {
+ old_pmd = READ_ONCE(*pmdp);
+ new_pmd = pmd_wrprotect(old_pmd);
+
+ } while (!try_cmpxchg((pmdval_t *)pmdp, (pmdval_t *)&old_pmd, pmd_val(new_pmd)));
+
+ return;
+ }
clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
}

--
2.21.0