On 9/7/23 11:03, Adam Dunlap wrote:
On Thu, Sep 7, 2023 at 10:06 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
How about something like the attached patch?
I think that's a much better idea, but unfortunately we can't rely on
boot_cpu_data being 0'd out. While it is a static global variable that C says
should be 0'd, the early interrupt happens before .bss is cleared (Note if it
happens to be 0, then the __is_canonical_address check succeeds anyway -- the
boot failure only happens when that variable happens to be other random values).
If there's another check we could do, I'd agree that'd end up being a much
better patch. For example, maybe we could add a field to cpuinfo_x86 is_valid
that is manually (i.e. not part of the regular .bss clearing) set to false and
is only set to true after early_identify_cpu is finished. Or the
simplest thing would be to just manually set x86_virt_bits to 0 somewhere super early.
Gah, what a mess. So the guilty CPUID in question is this:
/* Setup and Load IDT */
call early_setup_idt
/* Check if nx is implemented */
movl $0x80000001, %eax
cpuid
movl %edx,%edi
which is _barely_ after we have an IDT with which to generate
exceptions. What happens before this? This isn't the first CPUID
invocation. Does this one just happen to #VC and all the others before
don't?
In any case, the most straightforward way out of this mess is to just
move boot_cpu_data out of .bss and explicitly initialize it along with
some documentation explaining the situation.
Also, what's the root cause here? What's causing the early exception?
It is some silly CPUID leaf? Should we be more careful to just avoid
these exceptions?
Yes, it is a CPUID instruction in secondary_startup_64 with the comment /* Check
if nx is implemented */. It appears to be pretty important. Potentially we could
paravirtualize the CPUID direclty (i.e. mess with the GHCB and make the vmgexit)
instead of taking the #VC, but I'm not sure that's a great idea.
What TDX does here is actually pretty nice. It doesn't generate a #VE
for these.
But seriously, is it even *possible* to spin up a SEV-SNP VM what
doesn't have NX?
There's a couple of other CPUIDs that are called in early_identify_cpu between
when x86_virt_bits is set to 48 and when it is set to its real value (IIUC, it
may be set to 57 if 5 level paging is enabled), which could potentially cause
spurious failures in those later calls.
That should be easy enough to fix.
These things where we initialize a value and then write over it are
always fragile. Let's just make one function that does it right and
does it once.
Totally untested patch attached.