Re: [PATCH v5 0/2] MTE support for KVM guest

From: Catalin Marinas
Date: Wed Dec 09 2020 - 10:28:58 EST


On Wed, Dec 09, 2020 at 01:25:18PM +0000, Marc Zyngier wrote:
> On 2020-12-09 12:44, Catalin Marinas wrote:
> > On Tue, Dec 08, 2020 at 06:21:12PM +0000, Marc Zyngier wrote:
> > > On 2020-12-08 17:21, Catalin Marinas wrote:
> > > > On Mon, Dec 07, 2020 at 07:03:13PM +0000, Marc Zyngier wrote:
> > > > > I wonder whether we will have to have something kernel side to
> > > > > dump/reload tags in a way that matches the patterns used by live
> > > > > migration.
> > > >
> > > > We have something related - ptrace dumps/resores the tags. Can the same
> > > > concept be expanded to a KVM ioctl?
> > >
> > > Yes, although I wonder whether we should integrate this deeply into
> > > the dirty-log mechanism: it would be really interesting to dump the
> > > tags at the point where the page is flagged as clean from a dirty-log
> > > point of view. As the page is dirtied, discard the saved tags.
> >
> > From the VMM perspective, the tags can be treated just like additional
> > (meta)data in a page. We'd only need the tags when copying over. It can
> > race with the VM dirtying the page (writing tags would dirty it) but I
> > don't think the current migration code cares about this. If dirtied, it
> > copies it again.
> >
> > The only downside I see is an extra syscall per page both on the origin
> > VMM and the destination one to dump/restore the tags. Is this a
> > performance issue?
>
> I'm not sure. Migrating VMs already has a massive overhead, so an extra
> syscall per page isn't terrifying. But that's the point where I admit
> not knowing enough about what the VMM expects, nor whether that matches
> what happens on other architectures that deal with per-page metadata.
>
> Would this syscall operate on the guest address space? Or on the VMM's
> own mapping?

Whatever is easier for the VMM, I don't think it matters as long as the
host kernel can get the actual physical address (and linear map
correspondent). Maybe simpler if it's the VMM address space as the
kernel can check the access permissions in case you want to hide the
guest memory from the VMM for other reasons (migration is also off the
table).

Without syscalls, an option would be for the VMM to create two mappings:
one with PROT_MTE for migration and the other without for normal DMA
etc. That's achievable using memfd_create() or shm_open() and two mmap()
calls, only one having PROT_MTE. The VMM address space should be
sufficiently large to map two guest IPAs.

--
Catalin