Re: [PATCHv7 00/14] mm, x86/cc: Implement support for unaccepted memory

From: Marc Orr
Date: Fri Jun 24 2022 - 13:06:21 EST


On Fri, Jun 24, 2022 at 9:57 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> Peter, is your enter key broken? You seem to be typing all your text in
> a single unreadable paragraph.
>
> On 6/24/22 09:37, Peter Gonda wrote:
> > if a customer incorrectly labels their image it may fail to boot..
>
> You're saying that firmware basically has two choices:
> 1. Accept all the memory up front and boot slowly, but reliably
> 2. Use thus "unaccepted memory" mechanism, boot fast, but risk that the
> VM loses a bunch of memory.
>
> If the guest can't even boot because of a lack of memory, then the
> pre-accepted chunk is probably too small in the first place.
>
> If the customer screws up, they lose a bunch of the RAM they paid for.
> That seems like a rather self-correcting problem to me.

I think Peter's point is a little more nuanced than that. Once lazy
accept goes into the guest firmware -- without the feature negotiation
that Peter is suggesting -- cloud providers now have a bookkeeping
problem. Which images have kernels that can boot from a guest firmware
that doesn't pre-validate all the guest memory?

The way we've been solving similar bookkeeping problems up to now
(e.g., Which guest can run with CVM features like TDX/SEV enabled?
Which SEV guests can live migrate?) is as follows. We tag images with
feature tags. But this is sort of a hack. And not a great one. It's
confusing to customers, hard for the cloud service provider to
support, and easy to mess up.

It would be better if the guest FW knew whether or not the kernel it
was going to launch supported lazy accept.

That being said, this does seem like a difficult problem to solve,
since it's sort of backward from how things work, in that when the
guest firmware wants to switch between pre-validating all memory vs.
minimizing what it pre-validates, the guest kernel is not running yet!
But if there is some way to do this, it would be a huge improvement
over the current status quo of pushing the feature negotiation up to
the cloud service provider and ultimately the cloud customer.