Re: Redoing eXclusive Page Frame Ownership (XPFO) with isolated CPUs in mind (for KVM to isolate its guests per CPU)
From: Tycho Andersen
Date: Wed Oct 24 2018 - 11:00:49 EST
On Wed, Oct 24, 2018 at 04:30:42PM +0530, Khalid Aziz wrote:
> On 10/15/2018 01:37 PM, Khalid Aziz wrote:
> > On 09/24/2018 08:45 AM, Stecklina, Julian wrote:
> > > I didn't test the version with TLB flushes, because it's clear that the
> > > overhead is so bad that no one wants to use this.
> > I don't think we can ignore the vulnerability caused by not flushing
> > stale TLB entries. On a mostly idle system, TLB entries hang around long
> > enough to make it fairly easy to exploit this. I was able to use the
> > additional test in lkdtm module added by this patch series to
> > successfully read pages unmapped from physmap by just waiting for system
> > to become idle. A rogue program can simply monitor system load and mount
> > its attack using ret2dir exploit when system is mostly idle. This brings
> > us back to the prohibitive cost of TLB flushes. If we are unmapping a
> > page from physmap every time the page is allocated to userspace, we are
> > forced to incur the cost of TLB flushes in some way. Work Tycho was
> > doing to implement Dave's suggestion can help here. Once Tycho has
> > something working, I can measure overhead on my test machine. Tycho, I
> > can help with your implementation if you need.
> I looked at Tycho's last patch with batch update from
> <https://lkml.org/lkml/2017/11/9/951>. I ported it on top of Julian's
> patches and got it working well enough to gather performance numbers. Here
> is what I see for system times on a machine with dual Xeon E5-2630 and 256GB
> of memory when running "make -j30 all" on 4.18.6 kernel (percentages are
> relative to base 4.19-rc8 kernel without xpfo):
> Base 4.19-rc8 913.84s
> 4.19-rc8 + xpfo, no TLB flush 1027.985s (+12.5%)
> 4.19-rc8 + batch update, no TLB flush 970.39s (+6.2%)
> 4.19-rc8 + xpfo, TLB flush 8458.449s (+825.6%)
> 4.19-rc8 + batch update, TLB flush 4665.659s (+410.6%)
> Batch update is significant improvement but we are starting so far behind
> baseline, it is still a huge slow down.
There's some other stuff that Dave suggested that I didn't do; in
particular coalesce xpfo bits instead of setting things once per page
when mappings are shared, etc.
Perhaps that will help more?
I'm still stuck working on something else for now, but I hope to be
able to participate more on this Soon (TM). Thanks for the testing!