Re: [PATCH v6 00/18] kvm: arm64: Dynamic IPA and 52bit IPA
From: Auger Eric
Date: Thu Oct 04 2018 - 04:40:16 EST
Hi Suzuki,
On 9/26/18 6:32 PM, Suzuki K Poulose wrote:
>
> The physical address space size for a VM (IPA size) on arm/arm64 is
> limited to a static limit of 40bits. This series adds support for
> using an IPA size specific to a VM, allowing to use a size supported
> by the host (based on the host kernel configuration and CPU support).
> The default size is fixed to 40bits. On arm64, we can allow the limit
> to be lowered (limiting the number of levels in stage2 to 2, to prevent
> splitting the host PMD huge pages at stage2). We also add support for
> handling 52bit IPA addresses (where supported) added by Arm v8.2
> extensions.
>
> We need to set the IPA limit as early as the VM creation to keep the
> code simpler to avoid sprinkling checks everywhere to ensure that the
> IPA is configured. We encode the IPA size in the machine_type
> argument to KVM_CREATE_VM ioctl. Bits [7-0] of the type are reserved
> for the IPA size. The availability of this feature is advertised by a
> new cap KVM_CAP_ARM_VM_IPA_SIZE. When supported, this capability
> returns the maximum IPA shift supported by the host. The supported IPA
> size on a host could be different from the system's PARange indicated
> by the CPUs (e.g, kernel limit on the PA size).
>
> Supporting different IPA size requires modification to the stage2 page
> table code. The arm64 page table level helpers are defined based on the
> page table levels used by the host VA. So, the accessors may not work
> if the guest uses more number of levels in stage2 than the stage1
> of the host. The previous versions (v1 & v2) of this series refactored
> the stage1 page table accessors to reuse the low-level accessors for an
> independent stage2 table. However, due to the level folding in the
> generic code, the types are redefined as well. i.e, if the PUD is
> folded, the pud_t could be defined as :
>
> typedef struct { pgd_t pgd; } pud_t;
>
> similarly for pmd_t. So, without stage1 independent page table entry
> types for stage2, we could be dealing with a different type for level
> 0-2 entries. This is practically fine on arm/arm64 as the entries
> have similar format and size and we always use the appropriate
> accessors to get the raw value (i.e, pud_val/pmd_val etc). But not
> ideal for a solution upstream. So, this version caps the stage2 page
> table levels to that of the stage1. This has the following impact on
> the IPA support for various pagesize/host-va combinations :
>
>
> x-----------------------------------------------------x
> | host\ipa | 40bit | 42bit | 44bit | 48bit | 52bit |
> -------------------------------------------------------
> | 39bit-4K | y | y | n | n | n/a |
> -------------------------------------------------------
> | 48bit-4K | y | y | y | y | n/a |
> -------------------------------------------------------
> | 36bit-16K | y | n | n | n | n/a |
> -------------------------------------------------------
> | 47bit-16K | y | y | y | y | n/a |
> -------------------------------------------------------
> | 48bit-4K | y | y | y | y | n/a |
> -------------------------------------------------------
> | 42bit-64K | y | y | y | n | n |
> -------------------------------------------------------
> | 48bit-64K | y | y | y | y | y |
> x-----------------------------------------------------x
>
> Or the following list shows what cannot be supported :
>
> 39bit-4K host | [44 - 48]
> 36bit-16K host | [41 - 48]
> 42bit-64K host | [47 - 52]
>
> which is not really bad. We can pursue the independent stage2
> page table support and lift the restriction once we get there.
> Given there is a proposal for new generic page table walker [0],
> it would make sense to make our efforts in sync with it to avoid
> diverting from a common API.
>
> 52bit support is added for VGIC (including ITS emulation) and handling
> of PAR, HPFAR registers.
>
> The series applies on 4.19-rc4. A tree is available here:
>
> git://linux-arm.org/linux-skp.git ipa52/v6
>
> Tested with
> - Modified kvmtool, which can only be used for (patches included in
> the series for reference / testing):
> * with virtio-pci upto 44bit PA (Due to 4K page size for virtio-pci
> legacy implemented by kvmtool)
> * Upto 48bit PA with virtio-mmio, due to 32bit PFN limitation.
> - Hacked Qemu (boot loader support for highmem, IPA size support)
> * with virtio-pci GIC-v3 ITS & MSI upto 52bit on Foundation model.
> Also see [1] for Qemu support.
>
> [0] https://lkml.org/lkml/2018/4/24/777
> [1] https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg05759.html
>
> Change since v5:
> - Don't raise the IPA Limit to 40bits on systems with lower PA size.
> Doesn't break backward compatibility, we still allow KVM_CREATE_VM
> to succeed with "0" as the IPA size (40bits). But prevent specifying
> 40bit explicitly, when the limit is lower.
> - Rename CAP, KVM_CAP_ARM_VM_PHYS_SHIFT => KVM_CAP_ARM_VM_IPA_SIZE
> and helper, KVM_VM_TYPE_ARM_VM_PHY_SHIFT => KVM_VM_TYPE_ARM_VM_IPA_SIZE
> - Update Documentation of the API
> - Update comments and commit description as reported by Eric
> - Set the missing TCR_T0SZ in patch "kvm: arm64: Configure VTCR_EL2 per VM"
> - Fix bits for CBASER_ADDRESS mask, GITS_CBASER_ADDRESS()
>
> Changes since V4:
> - Rebased on v4.19-rc3
> - Dropped virtio patches queued already by mst.
> - Collect Acks from Christoffer
> - Restrict IPA configuration support to arm64 only
> - Use KVM_CAP_ARM_VM_PHYS_SHIFT for detecting the support for
> IPA size configuration along with the limit on the IPA for the host.
> - Update comments on __load_guest_stage2
> - Add comment about the default value for unknown PARange values.
> - Update Documentation of the API
>
> Changes since V3:
> - Use per-VM VTCR instead per-VM private VTCR bits
> - Allow IPA less than 40bits
> - Split the patch adding support for stage2 dynamic page tables
> - Rearrange the series to keep the userspace API at the end, which
> needs further discussion.
> - Collect Reviews/Acks from Eric & Marc
>
> Changes since V2:
> - Drop "refactoring of host page table helpers" and restrict the IPA size
> to make sure stage2 doesn't use more page table levels than that of the host.
> - Load VTCR for TLB operations on behalf of the VM (Pointed-by: James Morse)
> - Split a couple of patches to make them easier to review.
> - Fall back to normal (non-concatenated) entry level page table support if
> possible.
> - Bump the IOCTL number
>
> Changes since V1:
> - Change the userspace API for configuring VM to encode the IPA
> size in the VM type. (suggested by Christoffer)
> - Expose the IPA limit on the host via ioctl on /dev/kvm
> - Handle 52bit addresses in PAR & HPFAR
> - Drop patch changing the life time of stage2 PGD
> - Rename macros for 48-to-52 bit conversion for GIC ITS BASER.
> (suggested by Christoffer)
> - Split virtio PFN check patches and address comments.
>
>
> Kristina Martsenko (1):
> vgic: Add support for 52bit guest physical address
>
> Suzuki K Poulose (17):
> kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table
> kvm: arm/arm64: Remove spurious WARN_ON
> kvm: arm64: Add helper for loading the stage2 setting for a VM
> arm64: Add a helper for PARange to physical shift conversion
> kvm: arm64: Clean up VTCR_EL2 initialisation
> kvm: arm/arm64: Allow arch specific configurations for VM
> kvm: arm64: Configure VTCR_EL2 per VM
> kvm: arm/arm64: Prepare for VM specific stage2 translations
> kvm: arm64: Prepare for dynamic stage2 page table layout
> kvm: arm64: Make stage2 page table layout dynamic
> kvm: arm64: Dynamic configuration of VTTBR mask
> kvm: arm64: Configure VTCR_EL2.SL0 per VM
> kvm: arm64: Switch to per VM IPA limit
> kvm: arm64: Add 52bit support for PAR to HPFAR conversoin
> kvm: arm64: Set a limit on the IPA size
> kvm: arm64: Limit the minimum number of page table levels
> kvm: arm64: Allow tuning the physical address size for VM
>
> Documentation/virtual/kvm/api.txt | 31 +++
> arch/arm/include/asm/kvm_arm.h | 3 +-
> arch/arm/include/asm/kvm_host.h | 7 +
> arch/arm/include/asm/kvm_mmu.h | 15 +-
> arch/arm/include/asm/stage2_pgtable.h | 50 ++--
> arch/arm64/include/asm/cpufeature.h | 20 ++
> arch/arm64/include/asm/kvm_arm.h | 157 +++++++++---
> arch/arm64/include/asm/kvm_asm.h | 2 -
> arch/arm64/include/asm/kvm_host.h | 16 +-
> arch/arm64/include/asm/kvm_hyp.h | 10 +
> arch/arm64/include/asm/kvm_mmu.h | 42 +++-
> arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 ----
> arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 ---
> arch/arm64/include/asm/stage2_pgtable.h | 236 +++++++++++++-----
> arch/arm64/kvm/hyp/Makefile | 1 -
> arch/arm64/kvm/hyp/s2-setup.c | 90 -------
> arch/arm64/kvm/hyp/switch.c | 4 +-
> arch/arm64/kvm/hyp/tlb.c | 4 +-
> arch/arm64/kvm/reset.c | 103 ++++++++
> include/linux/irqchip/arm-gic-v3.h | 5 +
> include/uapi/linux/kvm.h | 10 +
> virt/kvm/arm/arm.c | 9 +-
> virt/kvm/arm/mmu.c | 120 ++++-----
> virt/kvm/arm/vgic/vgic-its.c | 36 +--
> virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +-
> virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 -
> 26 files changed, 648 insertions(+), 408 deletions(-)
> delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
> delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h
> delete mode 100644 arch/arm64/kvm/hyp/s2-setup.c
>
> kvmtool changes:
>
> Suzuki K Poulose (4):
> kvmtool: Allow backends to run checks on the KVM device fd
> kvmtool: arm64: Add support for guest physical address size
> kvmtool: arm64: Switch memory layout
> kvmtool: arm: Add support for creating VM with PA size
>
> arm/aarch32/include/kvm/kvm-arch.h | 6 ++--
> arm/aarch64/include/kvm/kvm-arch.h | 15 ++++++++--
> arm/aarch64/include/kvm/kvm-config-arch.h | 5 +++-
> arm/include/arm-common/kvm-arch.h | 17 ++++++++----
> arm/include/arm-common/kvm-config-arch.h | 1 +
> arm/kvm.c | 34 ++++++++++++++++++++++-
> include/kvm/kvm.h | 4 +++
> kvm.c | 2 ++
> 8 files changed, 71 insertions(+), 13 deletions(-)
>
Feel free to add
Tested-by: Eric Auger <eric.auger@xxxxxxxxxx>
I tested this series with QEMU, using cold plugged 4GB PC-DIMM at 2TB on
a Gigabyte machine. The VM is created with 43 IPA bits. I ran memtester
on guest at 2TB using "memtester -p 20000000000 1G 1" and it succeeds.
Thanks
Eric