Re: [PATCH v6 18/18] kvm: arm64: Allow tuning the physical address size for VM

From: Christoffer Dall
Date: Wed Oct 31 2018 - 10:22:21 EST


On Wed, Sep 26, 2018 at 05:32:54PM +0100, Suzuki K Poulose wrote:
> Allow specifying the physical address size limit for a new
> VM via the kvm_type argument for the KVM_CREATE_VM ioctl. This
> allows us to finalise the stage2 page table as early as possible
> and hence perform the right checks on the memory slots
> without complication. The size is encoded as Log2(PA_Size) in
> bits[7:0] of the type field. For backward compatibility the
> value 0 is reserved and implies 40bits. Also, lift the limit
> of the IPA to host limit and allow lower IPA sizes (e.g, 32).
>
> The userspace could check the extension KVM_CAP_ARM_VM_IPA_SIZE
> for the availability of this feature. The cap check returns the
> maximum limit for the physical address shift supported by the host.
>
> Cc: Marc Zyngier <marc.zyngier@xxxxxxx>
> Cc: Christoffer Dall <cdall@xxxxxxxxxx>
> Cc: Peter Maydell <peter.maydell@xxxxxxxxxx>
> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@xxxxxxx>
> ---
> Changes since v5:
> - Rename the capability to KVM_CAP_ARM_VM_IPA_SIZE
> - Update Documentation of the API (Peter Maydell)
> - Fix comment/commit-description as spotted by Eric
> Changes since v4:
> - Fold the introduction of the KVM_CAP_ARM_VM_PHYS_SHIFT to this
> patch to allow detection of the availability of the feature for
> userspace.
> - Document the API
> - Restrict the feature only to arm64.
> Changes since V3:
> - Switch to a CAP, that can be checkd via EXTENSIONS on KVM device
> fd, rather than a dedicated ioctl.
> ---
> Documentation/virtual/kvm/api.txt | 31 +++++++++++++++++++++++++
> arch/arm64/include/asm/stage2_pgtable.h | 20 ----------------
> arch/arm64/kvm/reset.c | 17 ++++++++++----
> include/uapi/linux/kvm.h | 10 ++++++++
> 4 files changed, 54 insertions(+), 24 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index c664064f76fb..54eb7c763c89 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -123,6 +123,37 @@ memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
> flag KVM_VM_MIPS_VZ.
>
>
> +On arm64, the physical address size for a VM (IPA Size limit) is limited
> +to 40bits by default. The limit can be configured if the host supports the
> +extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
> +KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
> +identifier, where IPA_Bits is the maximum width of any physical
> +address used by the VM. The IPA_Bits is encoded in bits[7-0] of the
> +machine type identifier.
> +
> +e.g, to configure a guest to use 48bit physical address size :
> +
> + vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48));
> +
> +The requested size (IPA_Bits) must be :
> + 0 - Implies default size, 40bits (for backward compatibility)
> +
> + or
> +
> + N - Implies N bits, where N is a positive integer such that,
> + 32 <= N <= Host_IPA_Limit
> +
> +Host_IPA_Limit is the maximum possible value for IPA_Bits on the host and
> +is dependent on the CPU capability and the kernel configuration. The limit can
> +be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the KVM_CHECK_EXTENSION
> +ioctl() at run-time.
> +
> +Please note that configuring the IPA size does not affect the capability
> +exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects
> +size of the address translated by the stage2 level (guest physical to
> +host physical address translations).
> +
> +
> 4.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST
>
> Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST
> diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
> index 2cce769ba4c6..d352f6df8d2c 100644
> --- a/arch/arm64/include/asm/stage2_pgtable.h
> +++ b/arch/arm64/include/asm/stage2_pgtable.h
> @@ -42,28 +42,8 @@
> * the range (IPA_SHIFT, IPA_SHIFT - 4).
> */
> #define stage2_pgtable_levels(ipa) ARM64_HW_PGTABLE_LEVELS((ipa) - 4)
> -#define STAGE2_PGTABLE_LEVELS stage2_pgtable_levels(KVM_PHYS_SHIFT)
> #define kvm_stage2_levels(kvm) VTCR_EL2_LVLS(kvm->arch.vtcr)
>
> -/*
> - * With all the supported VA_BITs and 40bit guest IPA, the following condition
> - * is always true:
> - *
> - * STAGE2_PGTABLE_LEVELS <= CONFIG_PGTABLE_LEVELS
> - *
> - * We base our stage-2 page table walker helpers on this assumption and
> - * fall back to using the host version of the helper wherever possible.
> - * i.e, if a particular level is not folded (e.g, PUD) at stage2, we fall back
> - * to using the host version, since it is guaranteed it is not folded at host.
> - *
> - * If the condition breaks in the future, we can rearrange the host level
> - * definitions and reuse them for stage2. Till then...
> - */
> -#if STAGE2_PGTABLE_LEVELS > CONFIG_PGTABLE_LEVELS
> -#error "Unsupported combination of guest IPA and host VA_BITS."
> -#endif
> -
> -
> /* stage2_pgdir_shift() is the size mapped by top-level stage2 entry for the VM */
> #define stage2_pgdir_shift(kvm) pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
> #define stage2_pgdir_size(kvm) (1ULL << stage2_pgdir_shift(kvm))
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index f156e45760bc..95f28d5950e0 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -89,6 +89,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_VCPU_EVENTS:
> r = 1;
> break;
> + case KVM_CAP_ARM_VM_IPA_SIZE:
> + r = kvm_ipa_limit;
> + break;
> default:
> r = 0;
> }
> @@ -192,17 +195,23 @@ int kvm_arm_config_vm(struct kvm *kvm, unsigned long type)
> u32 parange, phys_shift;
> u8 lvls;
>
> - if (type)
> + if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
> return -EINVAL;
>
> + phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
> + if (phys_shift) {
> + if (phys_shift > kvm_ipa_limit ||
> + phys_shift < 32)
> + return -EINVAL;

I am concerned here that if we allow the user to set the phys_size to 32
bits, then we end up with 2 levels of stage2 page tables, which means
that the size of a stage2 pmd mapping becomes the size of a stage2 pgd
mapping, yet we can still decide in user_mem_abort() that a stage2 fault
is backed by PMD size mappings on the host, and attempt a huge mapping
at stage2, which then becomes a PGD level block map, I think.

Is this handled somehow? If so, how?

I can't see user_mem_abort() being modified to explicitly handle this
in your code, but perhaps the stage2_set_pmd_huge() call ends up
actually mapping at the stage2 pte level, but I can't tell that it does.
In any case, I think user_mem_abort() should give up on pmd/pud huge
mappings if the size mapped by the stage2/stage1 pmd/pud levels don't
line up. What do you think?


Thanks,

Christoffer

> + } else {
> + phys_shift = KVM_PHYS_SHIFT;
> + }
> +
> parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 7;
> if (parange > ID_AA64MMFR0_PARANGE_MAX)
> parange = ID_AA64MMFR0_PARANGE_MAX;
> vtcr |= parange << VTCR_EL2_PS_SHIFT;
>
> - phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
> - if (phys_shift > KVM_PHYS_SHIFT)
> - phys_shift = KVM_PHYS_SHIFT;
> vtcr |= VTCR_EL2_T0SZ(phys_shift);
> /*
> * Use a minimum 2 level page table to prevent splitting
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 07548de5c988..9b949efcfd32 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -750,6 +750,15 @@ struct kvm_ppc_resize_hpt {
>
> #define KVM_S390_SIE_PAGE_OFFSET 1
>
> +/*
> + * On arm64, machine type can be used to request the physical
> + * address size for the VM. Bits[7-0] are reserved for the guest
> + * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
> + * value 0 implies the default IPA size, 40bits.
> + */
> +#define KVM_VM_TYPE_ARM_IPA_SIZE_MASK 0xffULL
> +#define KVM_VM_TYPE_ARM_IPA_SIZE(x) \
> + ((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
> /*
> * ioctls for /dev/kvm fds:
> */
> @@ -952,6 +961,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_S390_HPAGE_1M 156
> #define KVM_CAP_NESTED_STATE 157
> #define KVM_CAP_ARM_INJECT_SERROR_ESR 158
> +#define KVM_CAP_ARM_VM_IPA_SIZE 159 /* returns maximum IPA bits for a VM */
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> --
> 2.19.0
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@xxxxxxxxxxxxxxxxxxxxx
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm