Re: [RFC PATCH 30/39] KVM: guest_memfd: Handle folio preparation for guest_memfd mmap

From: Manwaring, Derek
Date: Tue Oct 08 2024 - 23:51:36 EST


On 2024-10-08 at 19:56+0000 Sean Christopherson wrote:
> Another (slightly crazy) approach would be use protection keys to provide the
> security properties that you want, while giving KVM (and userspace) a quick-and-easy
> override to access guest memory.
>
>  1. mmap() guest_memfd into userpace with RW protections
>  2. Configure PKRU to make guest_memfd memory inaccessible by default
>  3. Swizzle PKRU on-demand when intentionally accessing guest memory
>
> It's essentially the same idea as SMAP+STAC/CLAC, just applied to guest memory
> instead of to usersepace memory.
>
> The benefit of the PKRU approach is that there are no PTE modifications, and thus
> no TLB flushes, and only the CPU that is access guest memory gains temporary
> access.  The big downside is that it would be limited to modern hardware, but
> that might be acceptable, especially if it simplifies KVM's implementation.

Yeah this might be worth it if it simplifies significantly. Jenkins et
al. showed MPK worked for stopping in-process Spectre V1 [1]. While
future hardware bugs are always possible, the host kernel would still
offer better protection overall since discovery of additional Spectre
approaches and gadgets in the kernel is more likely (I think it's a
bigger surface area than hardware-specific MPK transient execution
issues).

Patrick, we talked about this a couple weeks ago and ended up focusing
on within-userspace protection, but I see keys can also be used to stop
kernel access like Andrew's project he mentioned during Dave's MPK
session at LPC [2]. Andrew, could you share that here?

It's not clear to me how reliably the kernel prevents its own access to
such pages. I see a few papers that warrant more investigation:

"we found multiple interfaces that Linux, by design, provides for
accessing process memory that ignore PKU domains on a page." [3]

"Though Connor et al. demonstrate that existing MPK protections can be
bypassed by using the kernel as a confused deputy, compelling recent
work indicates that MPK operations can be made secure." [4]

Dave and others, if you're aware of resources clarifying how strong the
boundaries are, that would be helpful.

Derek


[1] https://www.cs.dartmouth.edu/~sws/pubs/jas2020.pdf
[2] https://www.youtube.com/watch?v=gEUeMfrNH94&t=1028s
[3] https://www.usenix.org/system/files/sec20-connor.pdf
[4] https://ics.uci.edu/~dabrowsa/kirth-eurosys22-pkru.pdf