Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO)

From: Tycho Andersen
Date: Wed Sep 20 2017 - 20:09:10 EST


On Wed, Sep 20, 2017 at 04:21:15PM -0700, Dave Hansen wrote:
> On 09/20/2017 03:34 PM, Tycho Andersen wrote:
> >> I really have to wonder whether there are better ret2dir defenses than
> >> this. The allocator just seems like the *wrong* place to be doing this
> >> because it's such a hot path.
> >
> > This might be crazy, but what if we defer flushing of the kernel
> > ranges until just before we return to userspace? We'd still manipulate
> > the prot/xpfo bits for the pages, but then just keep a list of which
> > ranges need to be flushed, and do the right thing before we return.
> > This leaves a little window between the actual allocation and the
> > flush, but userspace would need another thread in its threadgroup to
> > predict the next allocation, write the bad stuff there, and do the
> > exploit all in that window.
>
> I think the common case is still that you enter the kernel, allocate a
> single page (or very few) and then exit. So, you don't really reduce
> the total number of flushes.
>
> Just think of this in terms of IPIs to do the remote TLB flushes. A CPU
> can do roughly 1 million page faults and allocations a second. Say you
> have a 2-socket x 28-core x 2 hyperthead system = 112 CPU threads.
> That's 111M IPI interrupts/second, just for the TLB flushes, *ON* *EACH*
> *CPU*.

Since we only need to flush when something switches from a userspace
to a kernel page or back, hopefully it's not this bad, but point
taken.

> I think the only thing that will really help here is if you batch the
> allocations. For instance, you could make sure that the per-cpu-pageset
> lists always contain either all kernel or all user data. Then remap the
> entire list at once and do a single flush after the entire list is consumed.

Just so I understand, the idea would be that we only flush when the
type of allocation alternates, so:

kmalloc(..., GFP_KERNEL);
kmalloc(..., GFP_KERNEL);
/* remap+flush here */
kmalloc(..., GFP_HIGHUSER);
/* remap+flush here */
kmalloc(..., GFP_KERNEL);

?

Tycho