Re: [PATCH v2 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping

From: Laurent Dufour
Date: Thu Jul 23 2020 - 10:07:05 EST


Le 23/07/2020 Ã 14:32, Laurent Dufour a ÃcritÂ:
Le 23/07/2020 Ã 05:36, Bharata B Rao a ÃcritÂ:
On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote:
When a secure memslot is dropped, all the pages backed in the secure device
(aka really backed by secure memory by the Ultravisor) should be paged out
to a normal page. Previously, this was achieved by triggering the page
fault mechanism which is calling kvmppc_svm_page_out() on each pages.

This can't work when hot unplugging a memory slot because the memory slot
is flagged as invalid and gfn_to_pfn() is then not trying to access the
page, so the page fault mechanism is not triggered.

Since the final goal is to make a call to kvmppc_svm_page_out() it seems
simpler to directly calling it instead of triggering such a mechanism. This
way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
memslot.

Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
the call to __kvmppc_svm_page_out() is made.
As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the
VMA is fetched in a lazy way, to not trigger find_vma() all the time. In
addition, the mmap_sem is help in read mode during that time, not in write
mode since the virual memory layout is not impacted, and
kvm->arch.uvmem_lock prevents concurrent operation on the secure device.

Cc: Ram Pai <linuxram@xxxxxxxxxx>
Cc: Bharata B Rao <bharata@xxxxxxxxxxxxx>
Cc: Paul Mackerras <paulus@xxxxxxxxxx>
Signed-off-by: Laurent Dufour <ldufour@xxxxxxxxxxxxx>
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 54 ++++++++++++++++++++----------
 1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 5a4b02d3f651..ba5c7c77cc3a 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -624,35 +624,55 @@ static inline int kvmppc_svm_page_out(struct vm_area_struct *vma,
ÂÂ * fault on them, do fault time migration to replace the device PTEs in
ÂÂ * QEMU page table with normal PTEs from newly allocated pages.
ÂÂ */
-void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
+void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct kvm *kvm, bool skip_page_out)
 {
ÂÂÂÂÂ int i;
ÂÂÂÂÂ struct kvmppc_uvmem_page_pvt *pvt;
-ÂÂÂ unsigned long pfn, uvmem_pfn;
-ÂÂÂ unsigned long gfn = free->base_gfn;
+ÂÂÂ struct page *uvmem_page;
+ÂÂÂ struct vm_area_struct *vma = NULL;
+ÂÂÂ unsigned long uvmem_pfn, gfn;
+ÂÂÂ unsigned long addr, end;
+
+ÂÂÂ mmap_read_lock(kvm->mm);
+
+ÂÂÂ addr = slot->userspace_addr;

We typically use gfn_to_hva() for that, but that won't work for a
memslot that is already marked INVALID which is the case here.
I think it is ok to access slot->userspace_addr here of an INVALID
memslot, but just thought of explictly bringing this up.

Which explicitly mentioned above in the patch's description:

This can't work when hot unplugging a memory slot because the memory slot
is flagged as invalid and gfn_to_pfn() is then not trying to access the
page, so the page fault mechanism is not triggered.


+ÂÂÂ end = addr + (slot->npages * PAGE_SIZE);
-ÂÂÂ for (i = free->npages; i; --i, ++gfn) {
-ÂÂÂÂÂÂÂ struct page *uvmem_page;
+ÂÂÂ gfn = slot->base_gfn;
+ÂÂÂ for (i = slot->npages; i; --i, ++gfn, addr += PAGE_SIZE) {
+
+ÂÂÂÂÂÂÂ /* Fetch the VMA if addr is not in the latest fetched one */
+ÂÂÂÂÂÂÂ if (!vma || (addr < vma->vm_start || addr >= vma->vm_end)) {
+ÂÂÂÂÂÂÂÂÂÂÂ vma = find_vma_intersection(kvm->mm, addr, end);
+ÂÂÂÂÂÂÂÂÂÂÂ if (!vma ||
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vma->vm_start > addr || vma->vm_end < end) {
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ pr_err("Can't find VMA for gfn:0x%lx\n", gfn);
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
+ÂÂÂÂÂÂÂÂÂÂÂ }
+ÂÂÂÂÂÂÂ }

In Ram's series, kvmppc_memslot_page_merge() also walks the VMAs spanning
the memslot, but it uses a different logic for the same. Why can't these
two cases use the same method to walk the VMAs? Is there anything subtly
different between the two cases?

This is probably doable. At the time I wrote that patch, the kvmppc_memslot_page_merge() was not yet introduced AFAIR.

This being said, I'd help a lot to factorize that code... I let Ram dealing with that ;)

Indeed I don't think this is relevant, the loop in kvmppc_memslot_page_merge() deals with one call (to ksm_advise) per VMA, while this code is dealing with one call per page of the VMA, which completely different.

I don't think merging the both will be a good idea.

Cheers,
Laurent.