Re: [RFC PATCH v3 12/24] x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for _PAGE_DIRTY_SW

From: Jann Horn
Date: Thu Aug 30 2018 - 17:47:46 EST


On Thu, Aug 30, 2018 at 11:01 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
>
> On Thu, Aug 30, 2018 at 10:57 PM Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx> wrote:
> >
> > On Thu, 2018-08-30 at 22:44 +0200, Jann Horn wrote:
> > > On Thu, Aug 30, 2018 at 10:25 PM Yu-cheng Yu <yu-cheng.yu@xxxxxxxxx>
> > > wrote:
> > ...
> > > > In the flow you described, if C writes to the overflow page before
> > > > B
> > > > gets in with a 'call', the return address is still correct for
> > > > B. To
> > > > make an attack, C needs to write again before the TLB flush. I
> > > > agree
> > > > that is possible.
> > > >
> > > > Assume we have a guard page, can someone in the short window do
> > > > recursive calls in B, move ssp to the end of the guard page, and
> > > > trigger the same again? He can simply take the incssp route.
> > > I don't understand what you're saying. If the shadow stack is
> > > between
> > > guard pages, you should never be able to move SSP past that area's
> > > guard pages without an appropriate shadow stack token (not even with
> > > INCSSP, since that has a maximum range of PAGE_SIZE/2), and
> > > therefore,
> > > it shouldn't matter whether memory outside that range is incorrectly
> > > marked as shadow stack. Am I missing something?
> >
> > INCSSP has a range of 256, but we can do multiple of that.
> > But I realize the key is not to have the transient SHSTK page at all.
> > The guard page is !pte_write() and even we have flaws in
> > ptep_set_wrprotect(), there will not be any transient SHSTK pages. I
> > will add guard pages to both ends.
> >
> > Still thinking how to fix ptep_set_wrprotect().
>
> cmpxchg loop? Or is that slow?

Something like this:

static inline void ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
pte_t pte = READ_ONCE(*ptep), new_pte;

/* ... your comment about not needing a TLB shootdown here ... */
do {
pte = pte_wrprotect(pte);
/* note: relies on _PAGE_DIRTY_HW < _PAGE_DIRTY_SW */
/* dirty direct bit-twiddling; you can probably write
this in a nicer way */
pte.pte |= (pte.pte & _PAGE_DIRTY_HW) >>
_PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW;
pte.pte &= ~_PAGE_DIRTY_HW;
pte = cmpxchg(ptep, pte, new_pte);
} while (pte != new_pte);
}

I think this has the advantage of not generating weird spurious pagefaults.
It's not compatible with Xen PV, but I'm guessing that this whole
feature isn't going to support Xen PV anyway? So you could switch
between two implementations of ptep_set_wrprotect using the pvop
mechanism or so - one for environments that support shadow stacks, one
for all other environments.
Or is there some arcane reason why cmpxchg doesn't work here the way I
think it should?