Re: [PATCH v2 5/7] KVM: arm64: MTE: Use stage-2 NoTagAccess memory attribute if supported

From: Suzuki K Poulose
Date: Tue Jan 14 2025 - 04:56:01 EST


Hi,


On 13/01/2025 20:47, Peter Collingbourne wrote:
On Mon, Jan 13, 2025 at 11:09 AM Catalin Marinas
<catalin.marinas@xxxxxxx> wrote:

On Sat, Jan 11, 2025 at 06:49:55PM +0530, Aneesh Kumar K.V wrote:
Catalin Marinas <catalin.marinas@xxxxxxx> writes:
On Fri, Jan 10, 2025 at 04:30:21PM +0530, Aneesh Kumar K.V (Arm) wrote:
Currently, the kernel won't start a guest if the MTE feature is enabled

...

@@ -2152,7 +2162,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
if (!vma)
break;

- if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
+ if (kvm_has_mte(kvm) &&
+ !kvm_has_mte_perm(kvm) && !kvm_vma_mte_allowed(vma)) {
ret = -EINVAL;
break;
}

I don't think we should change this, or at least not how it's done above
(Suzuki raised a related issue internally relaxing this for VM_PFNMAP).

For standard memory slots, we want to reject them upfront rather than
deferring to the fault handler. An example here is file mmap() passed as
standard RAM to the VM. It's an unnecessary change in behaviour IMHO.
I'd only relax this for VM_PFNMAP mappings further down in this
function (and move the VM_PFNMAP check above; see Suzuki's internal
patch, unless he posted it publicly already).

For the record, here is the patch Catalin was referring to.


--->8---

kvm: arm64: Allow device mappings with MTE

When a VM is configured to use MTE, we prevent mapping a "Device" to the VM.

e.g: with kvmtool (with additional debugging to dump error code and addresses)

$ lkvm run ... --vfio 0000:01:00.0 ...

Info: Using IOMMU type 3 for VFIO container
Error: 0000:01:00.0: failed to register region with KVM 50020000-50022000: -22
Warning: [0abc:aced] Error activating emulation for BAR 0
Error: 0000:01:00.0: failed to configure regions
Warning: Failed init: vfio__init

Only check for the MTE allowed for a VMA if it is not backed by "device"

Fixes: ea7fc1bb1cd1b ("KVM: arm64: Introduce MTE VM feature")
Cc: Steven Price <steven.price at arm.com>
Cc: Marc Zyngier <maz at kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose at arm.com>
---
arch/arm64/kvm/mmu.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8b2d803e558a..7e084f22e185 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2295,17 +2295,15 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
if (!vma)
break;

- if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
- ret = -EINVAL;
- break;
- }
-
if (vma->vm_flags & VM_PFNMAP) {
/* IO region dirty page logging not allowed */
if (new->flags & KVM_MEM_LOG_DIRTY_PAGES) {
ret = -EINVAL;
break;
}
+ } else if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
+ ret = -EINVAL;
+ break;
}
hva = min(reg_end, vma->vm_end);
} while (hva < reg_end);
--
2.34.1


Suzuki



But we want to handle memslots backed by pagecache pages for virtio-shm
here (virtiofs dax use case).

Ah, I forgot about this use case. So with virtiofs DAX, does a host page
cache page (host VMM mmap()) get mapped directly into the guest as a
separate memory slot? In this case, the host vma would not have
VM_MTE_ALLOWED set.

With MTE_PERM, we can essentially skip the
kvm_vma_mte_allowed(vma) check because we handle all types in the fault
handler.

This was pretty much the early behaviour when we added KVM support for
MTE, allow !VM_MTE_ALLOWED and trap them later. However, we disallowed
VM_SHARED because of some non-trivial race. Commit d89585fbb308 ("KVM:
arm64: unify the tests for VMAs in memslots when MTE is enabled")
changed this behaviour and the VM_MTE_ALLOWED check happens upfront. A
subsequent commit removed the VM_SHARED check.

It's a minor ABI change but I'm trying to figure out why we needed this
upfront check rather than simply dropping the VM_SHARED check. Adding
Peter in case he remembers. I can't see any race if we simply skipped
this check altogether, irrespective of FEAT_MTE_PERM.

I don't see a problem with removing the upfront check. The reason I
kept the check was IIRC just that there was already a check there and
its logic needed to be adjusted for my VM_SHARED changes.

Peter