Re: [EXTERNAL] Re: [PATCH 4/4] kvm: svm: Support KVM_SEV_SNP_PAGE_TYPE_VMSA at SNP_LAUNCH_UPDATE

From: Sean Christopherson

Date: Tue Jun 23 2026 - 16:24:13 EST

On Tue, Jun 23, 2026, Jon Lange wrote:
> On Tuesday, June 23, 2026 6:40 AM, Sean Christopherson wrote:
> > On Wed, Jun 17, 2026, Jörg Rödel wrote:
> > > On Wed, Jun 17, 2026 at 06:37:52AM -0700, Sean Christopherson wrote:
> > > > Ok, so it took us a few times to learn our lesson. I still don't see that as a
> > > > strong argument for new uAPI, especially not for VMSA pages. I am very firmly
> > > > of the opinion that letting anything but the host kernel configure the VMSA is
> > > > beyond stupid, but unfortunately we're stuck with AP_CREATION. Expanding that
> > > > surface has a very, very, VERY high bar to get over.
> > >
> > > The strongest argument in my view (and the main reason we are doing this) is
> > > actually the predictable launch measurement. On SEV-SNP this is a requirement
> > > to use platform VM-identity features like the ID Block.
> >
> > And I'm saying that unless KVM *can't* provide a predictable launch measurement,
> > which AIUI isn't the case, then the launch measurement *must* be stable across
> > kernels because it's part of KVM's ABI. So as I see it, the issue isn't that
> > KVM is inherently unpredictable, it's that we lack tests to validate a thorny,
> > subtle piece of KVM's ABI.
>
> Joerg is suggesting that we need a launch measurement that is stable not just
> across multiple launches on the same system, but across multiple hypervisors.

*sigh*

So that, and also the multi-VMPL implications, absolutely need to be decribed in
about this level of detail in the cover letter, and the changelog needs about the
same level of documentation to justify the various design decisions.

Bluntly, all of the changelogs in this series are awful. +200 lines of code in
arguably the nastiest bit of "architecture" KVM has to deal with, and the longest
changelog barely hits 6 lines.

And whatever uAPI we end up with needs tests, and a lot of them, including coverage
for negative testcases. Because I'm working on fixing the third? guest-exploitable
DoS that's unique to SNP this year, and I've reached my breaking point: I'm not
taking new functionality like this without sufficient test coverage.

Rant(s) aside, thanks for the information, it's super helpful!