Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature

From: Steven Price
Date: Wed Nov 18 2020 - 11:01:46 EST


On 17/11/2020 19:35, Marc Zyngier wrote:
Hi Steven,

On 2020-10-26 15:57, Steven Price wrote:
Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
for a VM. This exposes the feature to the guest and automatically tags
memory pages touched by the VM as PG_mte_tagged (and clears the tags
storage) to ensure that the guest cannot see stale tags, and so that the
tags are correctly saved/restored across swap.

Signed-off-by: Steven Price <steven.price@xxxxxxx>
Reviewed-by: Andrew Jones <drjones@xxxxxxxxxx>
---
 arch/arm64/include/asm/kvm_emulate.h |  3 +++
 arch/arm64/include/asm/kvm_host.h    |  3 +++
 arch/arm64/kvm/arm.c                 |  9 +++++++++
 arch/arm64/kvm/mmu.c                 | 20 ++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c            |  6 +++++-
 include/uapi/linux/kvm.h             |  1 +
 6 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h
b/arch/arm64/include/asm/kvm_emulate.h
index 5ef2669ccd6c..66c0d9e7c2b4 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -79,6 +79,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
     if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
         vcpu_el1_is_32bit(vcpu))
         vcpu->arch.hcr_el2 |= HCR_TID2;
+
+    if (vcpu->kvm->arch.mte_enabled)

Please add a predicate (vcpu_has_mte() or kvm_has_mte()?) for this.

Sure

+        vcpu->arch.hcr_el2 |= HCR_ATA;
 }

 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h
b/arch/arm64/include/asm/kvm_host.h
index 95ab7345dcc8..cd993aec0440 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -118,6 +118,9 @@ struct kvm_arch {
      */
     unsigned long *pmu_filter;
     unsigned int pmuver;
+
+    /* Memory Tagging Extension enabled for the guest */
+    bool mte_enabled;
 };

 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f56122eedffc..7ee93bcac017 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -89,6 +89,12 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
         r = 0;
         kvm->arch.return_nisv_io_abort_to_user = true;
         break;
+    case KVM_CAP_ARM_MTE:
+        if (!system_supports_mte() || kvm->created_vcpus)
+            return -EINVAL;

You also want to avoid 32bit guests. Also, what is the rational for

Interesting point, however if I understand correctly the 32 bit flag is a VCPU flag. And at this point kvm->created_vcpus==0, so I don't believe we actually know whether the guest is 32 bit or not at the point of this test. And since this is a per-VM flag it actually can make sense for a heterogeneous VM if anyone is crazy enough to want such a thing.

this being a VM capability as opposed to a CPU feature, similar
to SVE and PMU?

v1/v2 actually had it as a CPU feature. However you need a per-VM flag to enforce the use of tagged memory (the code in user_mem_abort() below).

+        r = 0;
+        kvm->arch.mte_enabled = true;
+        break;
     default:
         r = -EINVAL;
         break;
@@ -210,6 +216,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
          */
         r = 1;
         break;
+    case KVM_CAP_ARM_MTE:
+        r = system_supports_mte();

Same comment about 32bit.

As above, we don't know if we're launching a 32 bit guest or not.

+        break;
     case KVM_CAP_STEAL_TIME:
         r = kvm_arm_pvtime_supported();
         break;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 19aacc7d64de..38fe25310ca1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu,
phys_addr_t fault_ipa,
     if (vma_pagesize == PAGE_SIZE && !force_pte)
         vma_pagesize = transparent_hugepage_adjust(memslot, hva,
                                &pfn, &fault_ipa);
+
+    /*
+     * The otherwise redundant test for system_supports_mte() allows the
+     * code to be compiled out when CONFIG_ARM64_MTE is not present.
+     */
+    if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
+        /*
+         * VM will be able to see the page's tags, so we must ensure
+         * they have been initialised.
+         */
+        struct page *page = pfn_to_page(pfn);
+        long i, nr_pages = compound_nr(page);
+
+        /* if PG_mte_tagged is set, tags have already been initialised */
+        for (i = 0; i < nr_pages; i++, page++) {
+            if (!test_and_set_bit(PG_mte_tagged, &page->flags))
+                mte_clear_page_tags(page_address(page));
+        }
+    }

What are the visibility requirements for the tags, specially if the
guest has its MMU off? Is there any cache management that needs to
occur?

If the guest has its MMU off then the memory would be tread as Untagged by the architecture (the stage 1 page table must provide the 'tagged' flag). Architecturally the tag bits handled the same as the data bits so no extra/different cache management is required. The only exception in the architecture is that the tag values are optionally exposed in the normal PA space (in a potentially non-coherent way) - but if that is the case that PA space shouldn't be touched by Linux.

Another thing is device-like memory that is managed by userspace,
such as the QEMU emulated flash, for which there also might be tags.
How is that dealt with? In general, what are the expectations for
tags on memory shared between host and guest? Who owns them?

Actual device-like memory shouldn't be expected to have tags - they wouldn't have tags on a real host.

In terms of memory shared between host and guest - the tags are also shared, effectively tags are just data. Clearly the host and guest need to decide how to share the tag space. In general I would expect the tags to be ignored (and tag checking to be disabled) in those shared regions. Sadly the architecture doesn't provide a method to prevent the guest accessing tags in a region (without also crippling cacheability).

This is one of the areas that is potentially tricky for migration because the VMM may want to use MTE but must disable tag checking while touching the guest's memory because the guest may be using the tag memory for it's own purposes. There is some discussion about this in the cover letter of v2:

https://lore.kernel.org/kvmarm/20200904160018.29481-1-steven.price@xxxxxxx/
Thanks,

Steve

+
     if (writable) {
         prot |= KVM_PGTABLE_PROT_W;
         kvm_set_pfn_dirty(pfn);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 430e36e1a13d..35a3dc448231 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1132,7 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
             arm64_get_spectre_v2_state() == SPECTRE_UNAFFECTED)
             val |= (1UL << ID_AA64PFR0_CSV2_SHIFT);
     } else if (id == SYS_ID_AA64PFR1_EL1) {
-        val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
+        if (!vcpu->kvm->arch.mte_enabled)
+            val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
     } else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
         val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
              (0xfUL << ID_AA64ISAR1_API_SHIFT) |
@@ -1394,6 +1395,9 @@ static bool access_mte_regs(struct kvm_vcpu
*vcpu, struct sys_reg_params *p,
 static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
                    const struct sys_reg_desc *rd)
 {
+    if (vcpu->kvm->arch.mte_enabled)
+        return 0;
+
     return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
 }

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ca41220b40b8..3e6fb5b580a9 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_X86_USER_SPACE_MSR 188
 #define KVM_CAP_X86_MSR_FILTER 189
 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
+#define KVM_CAP_ARM_MTE 191

 #ifdef KVM_CAP_IRQ_ROUTING

Thanks,

        M.