Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn map and unmap

From: Sean Christopherson
Date: Fri Oct 15 2021 - 13:16:37 EST


On Fri, Oct 15, 2021, Brijesh Singh wrote:
>
> On 10/13/21 1:16 PM, Sean Christopherson wrote:
> > On Wed, Oct 13, 2021, Sean Christopherson wrote:
> >> On Fri, Aug 20, 2021, Brijesh Singh wrote:
> >>> When SEV-SNP is enabled in the guest VM, the guest memory pages can
> >>> either be a private or shared. A write from the hypervisor goes through
> >>> the RMP checks. If hardware sees that hypervisor is attempting to write
> >>> to a guest private page, then it triggers an RMP violation #PF.
> >>>
> >>> To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
> >>> used to verify that its safe to map a given guest page. Use the SRCU to
> >>> protect against the page state change for existing mapped pages.
> >> SRCU isn't protecting anything. The synchronize_srcu_expedited() in the PSC code
> >> forces it to wait for existing maps to go away, but it doesn't prevent new maps
> >> from being created while the actual RMP updates are in-flight. Most telling is
> >> that the RMP updates happen _after_ the synchronize_srcu_expedited() call.
> > Argh, another goof on my part. Rereading prior feedback, I see that I loosely
> > suggested SRCU as a possible solution. That was a bad, bad suggestion. I think
> > (hope) I made it offhand without really thinking it through. SRCU can't work in
> > this case, because the whole premise of Read-Copy-Update is that there can be
> > multiple copies of the data. That simply can't be true for the RMP as hardware
> > operates on a single table.
> >
> > In the future, please don't hesitate to push back on and/or question suggestions,
> > especially those that are made without concrete examples, i.e. are likely off the
> > cuff. My goal isn't to set you up for failure :-/
>
> What do you think about going back to my initial proposal of per-gfn
> tracking [1] ? We can limit the changes to just for the kvm_vcpu_map()
> and let the copy_to_user() take a fault and return an error (if it
> attempt to write to guest private). If PSC happen while lock is held
> then simplify return and let the guest retry PSC.

That approach is also broken as it doesn't hold a lock when updating host_write_track,
e.g. the count can be corrupted if writers collide, and nothing blocks writers on
in-progress readers.

I'm not opposed to a scheme that blocks PSC while KVM is reading, but I don't want
to spend time iterating on the KVM case until consensus has been reached on how
exactly RMP updates will be handled, and in general how the kernel will manage
guest private memory.