Re: [PATCH v5 0/2] MTE support for KVM guest

From: Steven Price
Date: Fri Nov 20 2020 - 04:51:00 EST


On 19/11/2020 19:11, Marc Zyngier wrote:
On 2020-11-19 18:42, Andrew Jones wrote:
On Thu, Nov 19, 2020 at 03:45:40PM +0000, Peter Maydell wrote:
On Thu, 19 Nov 2020 at 15:39, Steven Price <steven.price@xxxxxxx> wrote:
> This series adds support for Arm's Memory Tagging Extension (MTE) to
> KVM, allowing KVM guests to make use of it. This builds on the existing
> user space support already in v5.10-rc1, see [1] for an overview.

> The change to require the VMM to map all guest memory PROT_MTE is
> significant as it means that the VMM has to deal with the MTE tags even
> if it doesn't care about them (e.g. for virtual devices or if the VMM
> doesn't support migration). Also unfortunately because the VMM can
> change the memory layout at any time the check for PROT_MTE/VM_MTE has
> to be done very late (at the point of faulting pages into stage 2).

I'm a bit dubious about requring the VMM to map the guest memory
PROT_MTE unless somebody's done at least a sketch of the design
for how this would work on the QEMU side. Currently QEMU just
assumes the guest memory is guest memory and it can access it
without special precautions...


There are two statements being made here:

1) Requiring the use of PROT_MTE when mapping guest memory may not fit
   QEMU well.

2) New KVM features should be accompanied with supporting QEMU code in
   order to prove that the APIs make sense.

I strongly agree with (2). While kvmtool supports some quick testing, it
doesn't support migration. We must test all new features with a migration
supporting VMM.

I'm not sure about (1). I don't feel like it should be a major problem,
but (2).

(1) seems to be contentious whichever way we go. Either PROT_MTE isn't required in which case it's easy to accidentally screw up migration, or it is required in which case it's difficult to handle normal guest memory from the VMM. I get the impression that probably I should go back to the previous approach - sorry for the distraction with this change.

(2) isn't something I'm trying to skip, but I'm limited in what I can do myself so would appreciate help here. Haibo is looking into this.


I'd be happy to help with the QEMU prototype, but preferably when there's
hardware available. Has all the current MTE testing just been done on
simulators? And, if so, are there regression tests regularly running on
the simulators too? And can they test migration? If hardware doesn't
show up quickly and simulators aren't used for regression tests, then
all this code will start rotting from day one.

As Marc says, hardware isn't available. Testing is either via the Arm FVP model (that I've been using for most of my testing) or QEMU full system emulation.


While I agree with the sentiment, the reality is pretty bleak.

I'm pretty sure nobody will ever run a migration on emulation. I also doubt
there is much overlap between MTE users and migration users, unfortunately.

No HW is available today, and when it becomes available, it will be in
the form of a closed system on which QEMU doesn't run, either because
we are locked out of EL2 (as usual), or because migration is not part of
the use case (like KVM on Android, for example).

So we can wait another two (five?) years until general purpose HW becomes
available, or we start merging what we can test today. I'm inclined to
do the latter.

And I think it is absolutely fine for QEMU to say "no MTE support with KVM"
(we can remove all userspace visibility, except for the capability).

What I'm trying to achieve is a situation where KVM+MTE without migration works and we leave ourselves a clear path where migration can be added. With hindsight I think this version of the series was a wrong turn - if we return to not requiring PROT_MTE then we have the following two potential options to explore for migration in the future:

* The VMM can choose to enable PROT_MTE if it needs to, and if desired we can add a flag to enforce this in the kernel.

* If needed a new kernel interface can be provided to fetch/set tags from guest memory which isn't mapped PROT_MTE.

Does this sound reasonable?

I'll clean up the set_pte_at() change and post a v6 later today.