Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

From: Kirill A . Shutemov
Date: Fri Sep 09 2022 - 19:02:18 EST


On Fri, Sep 09, 2022 at 12:11:05PM -0700, Andy Lutomirski wrote:
>
>
> On Fri, Sep 9, 2022, at 7:32 AM, Kirill A . Shutemov wrote:
> > On Thu, Sep 08, 2022 at 09:48:35PM -0700, Andy Lutomirski wrote:
> >> On 8/19/22 17:27, Kirill A. Shutemov wrote:
> >> > On Thu, Aug 18, 2022 at 08:00:41PM -0700, Hugh Dickins wrote:
> >> > > On Thu, 18 Aug 2022, Kirill A . Shutemov wrote:
> >> > > > On Wed, Aug 17, 2022 at 10:40:12PM -0700, Hugh Dickins wrote:
> >> > > > >
> >> > > > > If your memory could be swapped, that would be enough of a good reason
> >> > > > > to make use of shmem.c: but it cannot be swapped; and although there
> >> > > > > are some references in the mailthreads to it perhaps being swappable
> >> > > > > in future, I get the impression that will not happen soon if ever.
> >> > > > >
> >> > > > > If your memory could be migrated, that would be some reason to use
> >> > > > > filesystem page cache (because page migration happens to understand
> >> > > > > that type of memory): but it cannot be migrated.
> >> > > >
> >> > > > Migration support is in pipeline. It is part of TDX 1.5 [1]. And swapping
> >> > > > theoretically possible, but I'm not aware of any plans as of now.
> >> > > >
> >> > > > [1] https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html
> >> > >
> >> > > I always forget, migration means different things to different audiences.
> >> > > As an mm person, I was meaning page migration, whereas a virtualization
> >> > > person thinks VM live migration (which that reference appears to be about),
> >> > > a scheduler person task migration, an ornithologist bird migration, etc.
> >> > >
> >> > > But you're an mm person too: you may have cited that reference in the
> >> > > knowledge that TDX 1.5 Live Migration will entail page migration of the
> >> > > kind I'm thinking of. (Anyway, it's not important to clarify that here.)
> >> >
> >> > TDX 1.5 brings both.
> >> >
> >> > In TDX speak, mm migration called relocation. See TDH.MEM.PAGE.RELOCATE.
> >> >
> >>
> >> This seems to be a pretty bad fit for the way that the core mm migrates
> >> pages. The core mm unmaps the page, then moves (in software) the contents
> >> to a new address, then faults it in. TDH.MEM.PAGE.RELOCATE doesn't fit into
> >> that workflow very well. I'm not saying it can't be done, but it won't just
> >> work.
> >
> > Hm. From what I see we have all necessary infrastructure in place.
> >
> > Unmaping is NOP for inaccessible pages as it is never mapped and we have
> > mapping->a_ops->migrate_folio() callback that allows to replace software
> > copying with whatever is needed, like TDH.MEM.PAGE.RELOCATE.
> >
> > What do I miss?
>
> Hmm, maybe this isn't as bad as I thought.
>
> Right now, unless I've missed something, the migration workflow is to
> unmap (via try_to_migrate) all mappings, then migrate the backing store
> (with ->migrate_folio(), although it seems like most callers expect the
> actual copy to happen outside of ->migrate_folio(),

Most? I guess you are talking about MIGRATE_SYNC_NO_COPY, right? AFAICS,
it is HMM thing and not a common thing.

> and then make new
> mappings. With the *current* (vma-based, not fd-based) model for KVM
> memory, this won't work -- we can't unmap before calling
> TDH.MEM.PAGE.RELOCATE.

We don't need to unmap. The page is not mapped from core-mm PoV.

> But maybe it's actually okay with some care or maybe mild modifications
> with the fd-based model. We don't have any mmaps, per se, to unmap for
> secret / INACCESSIBLE memory. So maybe we can get all the way to
> ->migrate_folio() without zapping anything in the secure EPT and just
> call TDH-MEM.PAGE.RELOCATE from inside migrate_folio(). And there will
> be nothing to fault back in. From the core code's perspective, it's
> like migrating a memfd that doesn't happen to have my mappings at the
> time.

Modifications needed if we want to initiate migation from userspace. IIRC,
we don't have any API that can initiate page migration for file ranges,
without mapping the file.

But kernel can do it fine for own housekeeping, like compaction doesn't
need any VMA. And we need compaction working for long term stability of
the system.

--
Kiryl Shutsemau / Kirill A. Shutemov