Re: [PATCH Part2 RFC v4 35/40] KVM: Add arch hooks to track the host write to guest memory

From: Sean Christopherson
Date: Mon Jul 19 2021 - 22:37:42 EST

Next message: Yunsheng Lin: "[PATCH v2 1/4] tools headers UAPI: add cache aligning related macro"
Previous message: Darrick J. Wong: "Re: [PATCH] xfs: Convert from atomic_t to refcount_t on xlog_ticket->t_ref"
Next in thread: Brijesh Singh: "Re: [PATCH Part2 RFC v4 35/40] KVM: Add arch hooks to track the host write to guest memory"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Jul 07, 2021, Brijesh Singh wrote:
> The kvm_write_guest{_page} and kvm_vcpu_write_guest{_page} are used by
> the hypevisor to write to the guest memory. The kvm_vcpu_map() and
> kvm_map_gfn() are used by the hypervisor to map the guest memory and
> and access it later.
>
> When SEV-SNP is enabled in the guest VM, the guest memory pages can
> either be a private or shared. A write from the hypervisor goes through
> the RMP checks. If hardware sees that hypervisor is attempting to write
> to a guest private page, then it triggers an RMP violation (i.e, #PF with
> RMP bit set).
>
> Enhance the KVM guest write helpers to invoke an architecture specific
> hooks (kvm_arch_write_gfn_{begin,end}) to track the write access from the
> hypervisor.
>
> When SEV-SNP is enabled, the guest uses the PAGE_STATE vmgexit to ask the
> hypervisor to change the page state from shared to private or vice versa.
> While changing the page state to private, use the
> kvm_host_write_track_is_active() to check whether the page is being
> tracked for the host write access (i.e either mapped or kvm_write_guest
> is in progress). If its tracked, then do not change the page state.
>
> Signed-off-by: Brijesh Singh <brijesh.singh@xxxxxxx>
> ---

...

> @@ -3468,3 +3489,33 @@ int sev_get_tdp_max_page_level(struct kvm_vcpu *vcpu, gpa_t gpa, int max_level)
>
> return min_t(uint32_t, level, max_level);
> }
> +
> +void sev_snp_write_page_begin(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn)
> +{
> + struct rmpentry *e;
> + int level, rc;
> + kvm_pfn_t pfn;
> +
> + if (!sev_snp_guest(kvm))
> + return;
> +
> + pfn = gfn_to_pfn(kvm, gfn);
> + if (is_error_noslot_pfn(pfn))
> + return;
> +
> + e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &level);
> + if (unlikely(!e))
> + return;
> +
> + /*
> + * A hypervisor should never write to the guest private page. A write to the
> + * guest private will cause an RMP violation. If the guest page is private,
> + * then make it shared.

NAK on converting RMP entries in response to guest accesses. Corrupting guest
data (due to dropping the "validated" flag) on a rogue/incorrect guest emulation
request or misconfigured PV feature is double ungood. The potential kernel panic
below isn't much better.

And I also don't think we need this heavyweight flow for user access, e.g.
__copy_to_user(), just eat the RMP violation #PF like all other #PFs and exit
to userspace with -EFAULT.

kvm_vcpu_map() and friends might need the manual lookup, at least initially, but
in an ideal world that would be naturally handled by gup(), e.g. by unmapping
guest private memory or whatever approach TDX ends up employing to avoid #MCs.

> + */
> + if (rmpentry_assigned(e)) {
> + pr_err("SEV-SNP: write to guest private gfn %llx\n", gfn);
> + rc = snp_make_page_shared(kvm_get_vcpu(kvm, 0),
> + gfn << PAGE_SHIFT, pfn, PG_LEVEL_4K);
> + BUG_ON(rc != 0);
> + }
> +}

...

> +void kvm_arch_write_gfn_begin(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn)
> +{
> + update_gfn_track(slot, gfn, KVM_PAGE_TRACK_WRITE, 1);

Tracking only writes isn't correct, as KVM reads to guest private memory will
return garbage. Pulling the rug out from under KVM reads won't fail as
spectacularly as writes (at least not right away), but they'll still fail. I'm
actually ok reading garbage if the guest screws up, but KVM needs consistent
semantics.

Good news is that per-gfn tracking is probably overkill anyways. As mentioned
above, user access don't need extra magic, they either fail or they don't.

For kvm_vcpu_map(), one thought would be to add a "short-term" map variant that
is not allowed to be retained across VM-Entry, and then use e.g. SRCU to block
PSC requests until there are no consumers.

> + if (kvm_x86_ops.write_page_begin)
> + kvm_x86_ops.write_page_begin(kvm, slot, gfn);
> +}

Next message: Yunsheng Lin: "[PATCH v2 1/4] tools headers UAPI: add cache aligning related macro"
Previous message: Darrick J. Wong: "Re: [PATCH] xfs: Convert from atomic_t to refcount_t on xlog_ticket->t_ref"
Next in thread: Brijesh Singh: "Re: [PATCH Part2 RFC v4 35/40] KVM: Add arch hooks to track the host write to guest memory"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]