Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

From: Ankur Arora
Date: Wed Dec 02 2020 - 15:36:52 EST


On 2020-12-02 2:44 a.m., Joao Martins wrote:
[late response - was on holiday yesterday]

On 12/2/20 12:40 AM, Ankur Arora wrote:
On 2020-12-01 5:07 a.m., David Woodhouse wrote:
On Wed, 2019-02-20 at 20:15 +0000, Joao Martins wrote:
+static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
+{
+ struct shared_info *shared_info;
+ struct page *page;
+
+ page = gfn_to_page(kvm, gfn);
+ if (is_error_page(page))
+ return -EINVAL;
+
+ kvm->arch.xen.shinfo_addr = gfn;
+
+ shared_info = page_to_virt(page);
+ memset(shared_info, 0, sizeof(struct shared_info));
+ kvm->arch.xen.shinfo = shared_info;
+ return 0;
+}
+

Hm.

How come we get to pin the page and directly dereference it every time,
while kvm_setup_pvclock_page() has to use kvm_write_guest_cached()
instead?

So looking at my WIP trees from the time, this is something that
we went back and forth on as well with using just a pinned page or a
persistent kvm_vcpu_map().

I remember distinguishing shared_info/vcpu_info from kvm_setup_pvclock_page()
as shared_info is created early and is not expected to change during the
lifetime of the guest which didn't seem true for MSR_KVM_SYSTEM_TIME (or
MSR_KVM_STEAL_TIME) so that would either need to do a kvm_vcpu_map()
kvm_vcpu_unmap() dance or do some kind of synchronization.

That said, I don't think this code explicitly disallows any updates
to shared_info.


If that was allowed, wouldn't it have been a much simpler fix for
CVE-2019-3016? What am I missing?

Agreed.

Perhaps, Paolo can chime in with why KVM never uses pinned page
and always prefers to do cached mappings instead?

Part of the CVE fix to not use cached versions.

It's not a longterm pin of the page unlike we try to do here (partly due to the nature
of the pages we are mapping) but we still we map the gpa, RMW the steal time struct, and
then unmap the page.

See record_steal_time() -- but more specifically commit b043138246 ("x86/KVM: Make sure
KVM_VCPU_FLUSH_TLB flag is not missed").

But I am not sure it's a good idea to follow the same as record_steal_time() given that
this is a fairly sensitive code path for event channels.


Should I rework these to use kvm_write_guest_cached()?

kvm_vcpu_map() would be better. The event channel logic does RMW operations
on shared_info->vcpu_info.

Indeed, yes.

Ankur IIRC, we saw missed event channels notifications when we were using the
{write,read}_cached() version of the patch.

But I can't remember the reason it was due to, either the evtchn_pending or the mask
word -- which would make it not inject an upcall.

If memory serves, it was the mask. Though I don't think that we had
kvm_{write,read}_cached in use at that point -- given that they were
definitely not RMW safe.


Ankur


Joao