Re: [PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort()

From: Suzuki K Poulose
Date: Mon Dec 10 2018 - 05:47:47 EST

On 10/12/2018 08:56, Christoffer Dall wrote:
On Mon, Dec 03, 2018 at 01:37:37PM +0000, Suzuki K Poulose wrote:
Hi Anshuman,

On 03/12/2018 12:11, Anshuman Khandual wrote:

On 10/31/2018 11:27 PM, Punit Agrawal wrote:
The code for operations such as marking the pfn as dirty, and
dcache/icache maintenance during stage 2 fault handling is duplicated
between normal pages and PMD hugepages.

Instead of creating another copy of the operations when we introduce
PUD hugepages, let's share them across the different pagesizes.

Signed-off-by: Punit Agrawal <punit.agrawal@xxxxxxx>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@xxxxxxx>
Cc: Christoffer Dall <christoffer.dall@xxxxxxx>
Cc: Marc Zyngier <marc.zyngier@xxxxxxx>
virt/kvm/arm/mmu.c | 49 ++++++++++++++++++++++++++++------------------
1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 5eca48bdb1a6..59595207c5e1 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
unsigned long fault_status)
int ret;
- bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
+ bool write_fault, exec_fault, writable, force_pte = false;
unsigned long mmu_seq;
gfn_t gfn = fault_ipa >> PAGE_SHIFT;
struct kvm *kvm = vcpu->kvm;
@@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
kvm_pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
bool logging_active = memslot_is_logging(memslot);
- unsigned long flags = 0;
+ unsigned long vma_pagesize, flags = 0;

A small nit s/vma_pagesize/pagesize. Why call it VMA ? Its implicit.

May be we could call it mapsize. pagesize is confusing.

I'm ok with mapsize. I see the vma_pagesize name coming from the fact
that this is initially set to the return value from vma_kernel_pagesize.

I have not problems with either vma_pagesize or mapsize.

write_fault = kvm_is_write_fault(vcpu);
exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1504,10 +1504,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
- if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) {
- hugetlb = true;
+ vma_pagesize = vma_kernel_pagesize(vma);
+ if (vma_pagesize == PMD_SIZE && !logging_active) {
gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
} else {
+ /*
+ * Fallback to PTE if it's not one of the Stage 2
+ * supported hugepage sizes
+ */
+ vma_pagesize = PAGE_SIZE;

This seems redundant and should be dropped. vma_kernel_pagesize() here either
calls hugetlb_vm_op_pagesize (via hugetlb_vm_ops->pagesize) or simply returns
PAGE_SIZE. The vm_ops path is taken if the QEMU VMA covering any given HVA is
backed either by HugeTLB pages or simply normal pages. vma_pagesize would
either has a value of PMD_SIZE (HugeTLB hstate based) or PAGE_SIZE. Hence if
its not PMD_SIZE it must be PAGE_SIZE which should not be assigned again.

We may want to force using the PTE mappings when logging_active (e.g, migration
?) to prevent keep tracking of huge pages. So the check is still valid.

Agreed, and let's not try additionally change the logic and flow with
this patch.

* Pages belonging to memslots that don't have the same
* alignment for userspace and IPA cannot be mapped using
@@ -1573,23 +1579,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (mmu_notifier_retry(kvm, mmu_seq))
goto out_unlock;
- if (!hugetlb && !force_pte)
- hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
+ if (vma_pagesize == PAGE_SIZE && !force_pte) {
+ /*
+ * Only PMD_SIZE transparent hugepages(THP) are
+ * currently supported. This code will need to be
+ * updated to support other THP sizes.
+ */

This comment belongs to transparent_hugepage_adjust() but not here.

I think this is relevant here than in thp_adjust, unless we rename
the function below to something generic, handle_hugepage_adjust().


+ if (transparent_hugepage_adjust(&pfn, &fault_ipa))
+ vma_pagesize = PMD_SIZE;

IIUC transparent_hugepage_adjust() is only getting called here. Instead of
returning 'true' when it is able to detect a huge page backing and doing
an adjustment there after, it should rather return THP size (PMD_SIZE) to
accommodate probable multi size THP support in future .

That makes sense.

That's fine.

Btw, after a further thought, since we don't have any THP support for anything
other than PMD_SIZE, I am dropping the above suggestion. We need to make changes
in our stage2 page table manipulation code anyway to support the new sizes. So
this could be addressed when we get there, to keep the changes minimal and
specific to the PUD huge page support.