Re: Redoing eXclusive Page Frame Ownership (XPFO) with isolated CPUs in mind (for KVM to isolate its guests per CPU)

From: Khalid Aziz
Date: Wed Oct 24 2018 - 07:01:04 EST

On 10/15/2018 01:37 PM, Khalid Aziz wrote:
On 09/24/2018 08:45 AM, Stecklina, Julian wrote:
I didn't test the version with TLB flushes, because it's clear that the
overhead is so bad that no one wants to use this.

I don't think we can ignore the vulnerability caused by not flushing stale TLB entries. On a mostly idle system, TLB entries hang around long enough to make it fairly easy to exploit this. I was able to use the additional test in lkdtm module added by this patch series to successfully read pages unmapped from physmap by just waiting for system to become idle. A rogue program can simply monitor system load and mount its attack using ret2dir exploit when system is mostly idle. This brings us back to the prohibitive cost of TLB flushes. If we are unmapping a page from physmap every time the page is allocated to userspace, we are forced to incur the cost of TLB flushes in some way. Work Tycho was doing to implement Dave's suggestion can help here. Once Tycho has something working, I can measure overhead on my test machine. Tycho, I can help with your implementation if you need.

I looked at Tycho's last patch with batch update from <>. I ported it on top of Julian's patches and got it working well enough to gather performance numbers. Here is what I see for system times on a machine with dual Xeon E5-2630 and 256GB of memory when running "make -j30 all" on 4.18.6 kernel (percentages are relative to base 4.19-rc8 kernel without xpfo):

Base 4.19-rc8 913.84s
4.19-rc8 + xpfo, no TLB flush 1027.985s (+12.5%)
4.19-rc8 + batch update, no TLB flush 970.39s (+6.2%)
4.19-rc8 + xpfo, TLB flush 8458.449s (+825.6%)
4.19-rc8 + batch update, TLB flush 4665.659s (+410.6%)

Batch update is significant improvement but we are starting so far behind baseline, it is still a huge slow down.