Re: KVM: Warn if mark_page_dirty() is called without an active vCPU

From: Christian Borntraeger
Date: Thu Jan 13 2022 - 10:09:28 EST




Am 13.01.22 um 14:22 schrieb David Woodhouse:
On Thu, 2022-01-13 at 13:51 +0100, Christian Borntraeger wrote:
Btw, that get_map_page() in arch/s390/kvm/interrupt.c looks like it has
the same use-after-free problem that kvm_map_gfn() used to have. It
probably wants converting to the new gfn_to_pfn_cache.

Take a look at how I resolve the same issue for delivering Xen event
channel interrupts.

Do you have a commit ID for your Xen event channel fix?

14243b387137 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event
channel delivery") and the commits reworking the gfn_to_pfn_cache which
lead up to it.

Questions: In your kvm_set_routing_entry() where it calls
gmap_translate() to turn the summary_addr and ind_addr from guest
addresses to userspace virtual addresses, what protects those
translations? If the gmap changes before kvm_set_routing_entry() even
returns, what ensures that the IRQ gets retranslated?

In the end the gmap translated between guest physical and host virtual, just
like the kvm memslots. This is done in kvm_arch_commit_memory_region. The gmap
is a method where we share the last level page table between the guest mapping
and the user mapping. That is why our memslots have to be on 1 MB boundary (our
page tables have 256 4k entries). So instead of gmap we could have used the
memslots as well for translation.

I have trouble seeing a kernel integrity issue: worst case is that we point to
a different address in the userspace mapping if qemu would change the memslots
maybe even to an invalid one. But then the access should fail (for invalid) or
you get a self-inflicted bit flips on your own address space if you play such
games.

Well, one thing: if QEMU changes memslots, it might need to redo the irqfd
registration as well as we do not follow these changes. Maybe this is something
that we could improve as future QEMUs could do more changes regarding memslots.


And later in adapter_indicators_set() where you take that userspace
virtual address and pass it to your get_map_page() function, the same
question: what if userspace does a concurrent mmap() and changes the
physical page that the userspace address points to?

In the latter case, at least it does look like you don't support
external memory accessed only by a PFN without having a corresponding
struct page. So at least you don't end up accessing a page that can now
belong to *another* guest, because the original underlying page is
locked. You're probably OK in that case, so it's just the gmap changing
that we need to focus on?