Re: [RFC PATCH v3 12/24] x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for _PAGE_DIRTY_SW

From: Yu-cheng Yu
Date: Fri Sep 14 2018 - 16:43:35 EST


On Fri, 2018-08-31 at 18:29 +0200, Peter Zijlstra wrote:
> On Fri, Aug 31, 2018 at 08:58:39AM -0700, Dave Hansen wrote:
> >
> > On 08/31/2018 08:48 AM, Yu-cheng Yu wrote:
> > >
> > > To trigger a race in ptep_set_wrprotect(), we need to fork from one of
> > > three pthread siblings.
> > >
> > > Or do we measure only how much this affects fork?
> > > If there is no racing, the effect should be minimal.
> > We don't need a race.
> >
> > I think the cmpxchg will be slower, even without a race, than the code
> > that was there before.ÂÂThe cmpxchg is a simple, straightforward
> > solution, but we're putting it in place of a plain memory write, which
> > is suboptimal.
> Note quite, the clear_bit() is LOCK prefixed.

With the updated ptep_set_wrprotect() below, I did MADV_WILLNEED to a shadow
stack of 8 MB, then 10,000 fork()'s, but could not prove it is more or less
efficient than the other. ÂSo can we say this is probably fine in terms of
efficiency?

Yu-cheng




--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1203,7 +1203,36 @@ static inline pte_t ptep_get_and_clear_full(struct
mm_struct *mm,
Âstatic inline void ptep_set_wrprotect(struct mm_struct *mm,
 ÂÂÂÂÂÂunsigned long addr, pte_t *ptep)
Â{
+#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
+ pte_t new_pte, pte = READ_ONCE(*ptep);
+
+ /*
+ Â* Some processors can start a write, but end up
+ Â* seeing a read-only PTE by the time they get
+ Â* to the Dirty bit.ÂÂIn this case, they will
+ Â* set the Dirty bit, leaving a read-only, Dirty
+ Â* PTE which looks like a Shadow Stack PTE.
+ Â*
+ Â* However, this behavior has been improved and
+ Â* will not occur on processors supporting
+ Â* Shadow Stacks.ÂÂWithout this guarantee, a
+ Â* transition to a non-present PTE and flush the
+ Â* TLB would be needed.
+ Â*
+ Â* When changing a writable PTE to read-only and
+ Â* if the PTE has _PAGE_DIRTY_HW set, we move
+ Â* that bit to _PAGE_DIRTY_SW so that the PTE is
+ Â* not a valid Shadow Stack PTE.
+ Â*/
+ do {
+ new_pte = pte_wrprotect(pte);
+ new_pte.pte |= (new_pte.pte & _PAGE_DIRTY_HW) >>
+ _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW;
+ new_pte.pte &= ~_PAGE_DIRTY_HW;
+ } while (!try_cmpxchg(ptep, &pte, new_pte));
+#else
 clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
+#endif
Â}