Re: [PATCH RFC 03/39] KVM: x86/xen: register shared_info page

From: David Woodhouse
Date: Thu Dec 03 2020 - 05:17:09 EST


On Wed, 2020-12-02 at 12:32 -0800, Ankur Arora wrote:
> > On IRC, Paolo told me that permanent pinning causes problems for memory
> > hotplug, and pointed me at the trick we do with an MMU notifier and
> > kvm_vcpu_reload_apic_access_page().
>
> Okay that answers my question. Thanks for clearing that up.
>
> Not sure of a good place to document this but it would be good to
> have this written down somewhere. Maybe kvm_map_gfn()?

Trying not to get too distracted by polishing this part, so I can
continue with making more things actually work. But I took a quick look
at the reload_apic_access_page() thing.

AFAICT it works because the access is only from *within* the vCPU, in
guest mode.

So all the notifier has to do is kick all CPUs, which happens when it
calls kvm_make_all_cpus_request(). Thus we are guaranteed that all CPUs
are *out* of guest mode by the time...

...er... maybe not by the time the notifier returns, because all
we've done is *send* the IPI and we don't know the other CPUs have
actually stopped running the guest yet?

Maybe there's some explanation of why the actual TLB shootdown
truly *will* occur before the page goes away, and some ordering
rules which mean our reschedule IPI will happen first? Something
like that ideally would have been in a comment in in MMU notifier.

Anyway, I'm not going to fret about that because it clearly doesn't
work for the pvclock/shinfo cases anyway; those are accessed from the
kernel *outside* guest mode. I think we can use a similar trick but
just protect the accesses with kvm->srcu.

The code in my tree now uses kvm_map_gfn() at setup time and leaves the
shinfo page mapped. A bit like the KVM pvclock code, except that I
don't pointlessly map and unmap all the time while leaving the page
pinned anyway.

So all we need to do is kvm_release_pfn() right after mapping it,
leaving the map->hva "unsafe".

Then we hook in to the MMU notifier to unmap it when it goes away (in
RCU style; clearing the pointer variable, synchronize_srcu(), and
*then* unmap, possibly in different range_start/range/range_end
callbacks).

The *users* of such mappings will need an RCU read lock, and will need
to be able to fault the GFN back in when they need it.

For now, though, I'm content that converting the Xen shinfo support to
kvm_map_gfn() is the right way to go, and the memory hotplug support is
an incremental addition on top of it.

And I'm even more comforted by the observation that the KVM pvclock
support *already* pins its pages, so I'm not even failing to meet the
existing bar :)

Attachment: smime.p7s
Description: S/MIME cryptographic signature