Re: [PATCH v2 2/4] mm: x86: Invoke hypercall when page encryption status is changed
From: Ashish Kalra
Date: Wed May 19 2021 - 08:06:56 EST
On Mon, May 17, 2021 at 07:01:09PM -0700, Steve Rutherford wrote:
> On Fri, May 14, 2021 at 3:05 AM Ashish Kalra <ashish.kalra@xxxxxxx> wrote:
> > On Fri, May 14, 2021 at 11:34:32AM +0200, Borislav Petkov wrote:
> > > On Fri, May 14, 2021 at 09:05:23AM +0000, Ashish Kalra wrote:
> > > > Ideally we should fail/stop migration even if a single guest page
> > > > encryption status cannot be notified and that should be the way to
> > > > proceed in this case, the guest kernel should notify the source
> > > > userspace VMM to block/stop migration in this case.
> > >
> > > Yap, and what I'm trying to point out here is that if the low level
> > > machinery fails for whatever reason and it cannot recover, we should
> > > propagate that error up the chain so that the user is aware as to why it
> > > failed.
> > >
> > I totally agree.
> > > WARN is a good first start but in some configurations those splats don't
> > > even get shown as people don't look at dmesg, etc.
> > >
> > > And I think it is very important to propagate those errors properly
> > > because there's a *lot* of moving parts involved in a guest migration
> > > and you have encrypted memory which makes debugging this probably even
> > > harder, etc, etc.
> > >
> > > I hope this makes more sense.
> > >
> > > > From a practical side, i do see Qemu's migrate_add_blocker() interface
> > > > but that looks to be a static interface and also i don't think it will
> > > > force stop an ongoing migration, is there an existing mechanism
> > > > to inform userspace VMM from kernel about blocking/stopping migration ?
> > >
> > > Hmm, so __set_memory_enc_dec() which calls
> > > notify_addr_enc_status_changed() is called by the guest, right, when it
> > > starts migrating.
> > >
> > No, actually notify_addr_enc_status_changed() is called whenever a range
> > of memory is marked as encrypted or decrypted, so it has nothing to do
> > with migration as such.
> > This is basically modifying the encryption attributes on the page tables
> > and correspondingly also making the hypercall to inform the hypervisor about
> > page status encryption changes. The hypervisor will use this information
> > during an ongoing or future migration, so this information is maintained
> > even though migration might never be initiated here.
> > > Can an error value from it be propagated up the callchain so it can be
> > > turned into an error messsage for the guest owner to see?
> > >
> > The error value cannot be propogated up the callchain directly
> > here, but one possibility is to leverage the hypercall and use Sean's
> > proposed hypercall interface to notify the host/hypervisor to block/stop
> > any future/ongoing migration.
> > Or as from Paolo's response, writing 0 to MIGRATION_CONTROL MSR seems
> > more ideal.
> > Thanks,
> > Ashish
> How realistic is this type of failure? If you've gotten this deep, it
> seems like something has gone very wrong if the memory you are about
> to mark as shared (or encrypted) doesn't exist and isn't mapped. In
> particular, is the kernel going to page fault when it tries to
> reinitialize the page it's currently changing the c-bit of? From what
> I recall, most paths that do either set_memory_encrypted or
> set_memory_decrypted memset the region being toggled. Note: dma_pool
> doesn't immediately memset, but the VA it's providing to set_decrypted
> is freshly fetched from a recently allocated region (something has to
> have gone pretty wrong if this is invalid if I'm not mistaken. No one
> would think twice if you wrote to that freshly allocated page).
> The reason I mention this is that SEV migration is going to be the
> least of your concerns if you are already on a one-way train towards a
> Kernel oops. I'm not certain I would go so far as to say this should
> BUG() instead (I think the page fault on access might be easier to
> debug a BUG here), but I'm pretty skeptical that the kernel is going
> to do too well if it doesn't know if its kernel VAs are valid.
> If, despite the above, we expect to infrequently-but-not-never disable
> migration with no intention of reenabling it, we should signal it
> differently than we currently signal migration enablement. Currently,
> if you toggle migration from on to off there is an implication that
> you are about to reboot, and you are only ephemerally unable to
> migrate. Having permanent disablement be indistinguishable from a
> really long reboot is a recipe for a really sad long timeout in
Also looking at set_memory_encrypted and set_memory_decrypted usage
patterns, the persistent decrypted regions like the dma pool,
bss_decrypted, etc., will be setup at boot time. Later the unused
bss_decrypted section will be freed and set_memory_encrypted called on
the freed memory at end of kernel boot.
Most of the runtime set_memory_encrypted and set_memory_decrypted calls
will be on dynamically allocated dma buffers via dma_alloc_coherent()
For example, A dma_alloc_coherent() request will allocate pages and then call
set_memory_decrypted() against them. When dma_free_coherent() is called,
set_memory_encrypted() is called against the pages about to be freed
before they are actually freed.
Now these buffers have very short life and only used for immediate I/O
and then freed, so they may not be of major concern for SEV
So disabling migration for failure of address lookup or mapping failures
on such pages will really be an overkill.
Might be in favor of Steve's thoughts above of doing a BUG() here