Re: [RFC PATCH v3 12/24] x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for _PAGE_DIRTY_SW

From: Jann Horn
Date: Thu Aug 30 2018 - 11:50:24 EST


On Thu, Aug 30, 2018 at 4:43 PM Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx> wrote:
>
> When Shadow Stack is enabled, the read-only and PAGE_DIRTY_HW PTE
> setting is reserved only for the Shadow Stack. To track dirty of
> non-Shadow Stack read-only PTEs, we use PAGE_DIRTY_SW.
>
> Update ptep_set_wrprotect() and pmdp_set_wrprotect().
>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx>
> ---
> arch/x86/include/asm/pgtable.h | 42 ++++++++++++++++++++++++++++++++++
> 1 file changed, 42 insertions(+)
>
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 4d50de77ea96..556ef258eeff 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -1203,7 +1203,28 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
> static inline void ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep)
> {
> + pte_t pte;
> +
> clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> + pte = *ptep;
> +
> + /*
> + * Some processors can start a write, but ending up seeing
> + * a read-only PTE by the time they get to the Dirty bit.
> + * In this case, they will set the Dirty bit, leaving a
> + * read-only, Dirty PTE which looks like a Shadow Stack PTE.
> + *
> + * However, this behavior has been improved and will not occur
> + * on processors supporting Shadow Stacks. Without this
> + * guarantee, a transition to a non-present PTE and flush the
> + * TLB would be needed.
> + *
> + * When change a writable PTE to read-only and if the PTE has
> + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so
> + * that the PTE is not a valid Shadow Stack PTE.
> + */
> + pte = pte_move_flags(pte, _PAGE_DIRTY_HW, _PAGE_DIRTY_SW);
> + set_pte_at(mm, addr, ptep, pte);
> }

I don't understand why it's okay that you first atomically clear the
RW bit, then atomically switch from DIRTY_HW to DIRTY_SW. Doesn't that
mean that between the two atomic writes, another core can incorrectly
see a shadow stack?