Re: [PATCH] x86/sev-es: Do not use copy_from_kernel_nofault in early #VC handler

From: Tom Lendacky
Date: Fri Sep 08 2023 - 09:14:08 EST


On 9/7/23 14:12, Dave Hansen wrote:
On 9/7/23 11:03, Adam Dunlap wrote:
On Thu, Sep 7, 2023 at 10:06 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
How about something like the attached patch?

I think that's a much better idea, but unfortunately we can't rely on
boot_cpu_data being 0'd out. While it is a static global variable that C says
should be 0'd, the early interrupt happens before .bss is cleared (Note if it
happens to be 0, then the __is_canonical_address check succeeds anyway -- the
boot failure only happens when that variable happens to be other random values).
If there's another check we could do, I'd agree that'd end up being a much
better patch. For example, maybe we could add a field to cpuinfo_x86 is_valid
that is manually (i.e. not part of the regular .bss clearing) set to false and
is only set to true after early_identify_cpu is finished. Or the
simplest thing would be to just manually set x86_virt_bits to 0 somewhere super early.

Gah, what a mess. So the guilty CPUID in question is this:

/* Setup and Load IDT */
call early_setup_idt

/* Check if nx is implemented */
movl $0x80000001, %eax
cpuid
movl %edx,%edi

which is _barely_ after we have an IDT with which to generate
exceptions. What happens before this? This isn't the first CPUID
invocation. Does this one just happen to #VC and all the others before
don't?

In any case, the most straightforward way out of this mess is to just
move boot_cpu_data out of .bss and explicitly initialize it along with
some documentation explaining the situation.

Agreed, putting this in the .data section will ensure it is zero from the start.


Also, what's the root cause here? What's causing the early exception?
It is some silly CPUID leaf? Should we be more careful to just avoid
these exceptions?

Yes, it is a CPUID instruction in secondary_startup_64 with the comment /* Check
if nx is implemented */. It appears to be pretty important. Potentially we could
paravirtualize the CPUID direclty (i.e. mess with the GHCB and make the vmgexit)
instead of taking the #VC, but I'm not sure that's a great idea.

What TDX does here is actually pretty nice. It doesn't generate a #VE
for these.

But seriously, is it even *possible* to spin up a SEV-SNP VM what
doesn't have NX?

It is a common path, so while an SEV guest would have NX support, you would first have to determine that it is an SEV guest. That would take issuing a CPUID instruction in order to determine if a particular MSR can be read...

Ultimately, we could probably pass the encryption mask from the decompressor to the kernel and avoid some of the checks during early boot of the kernel proper. Is it possible to boot an x86 kernel without going through the decompressor?

Thanks,
Tom


There's a couple of other CPUIDs that are called in early_identify_cpu between
when x86_virt_bits is set to 48 and when it is set to its real value (IIUC, it
may be set to 57 if 5 level paging is enabled), which could potentially cause
spurious failures in those later calls.

That should be easy enough to fix.

These things where we initialize a value and then write over it are
always fragile. Let's just make one function that does it right and
does it once.

Totally untested patch attached.