Re: [PATCH] x86/sev-es: Do not use copy_from_kernel_nofault in early #VC handler

From: Dave Hansen
Date: Thu Sep 07 2023 - 15:12:27 EST


On 9/7/23 11:03, Adam Dunlap wrote:
> On Thu, Sep 7, 2023 at 10:06 AM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>> How about something like the attached patch?
>
> I think that's a much better idea, but unfortunately we can't rely on
> boot_cpu_data being 0'd out. While it is a static global variable that C says
> should be 0'd, the early interrupt happens before .bss is cleared (Note if it
> happens to be 0, then the __is_canonical_address check succeeds anyway -- the
> boot failure only happens when that variable happens to be other random values).
> If there's another check we could do, I'd agree that'd end up being a much
> better patch. For example, maybe we could add a field to cpuinfo_x86 is_valid
> that is manually (i.e. not part of the regular .bss clearing) set to false and
> is only set to true after early_identify_cpu is finished. Or the
> simplest thing would be to just manually set x86_virt_bits to 0 somewhere super early.

Gah, what a mess. So the guilty CPUID in question is this:

> /* Setup and Load IDT */
> call early_setup_idt
>
> /* Check if nx is implemented */
> movl $0x80000001, %eax
> cpuid
> movl %edx,%edi

which is _barely_ after we have an IDT with which to generate
exceptions. What happens before this? This isn't the first CPUID
invocation. Does this one just happen to #VC and all the others before
don't?

In any case, the most straightforward way out of this mess is to just
move boot_cpu_data out of .bss and explicitly initialize it along with
some documentation explaining the situation.

>> Also, what's the root cause here? What's causing the early exception?
>> It is some silly CPUID leaf? Should we be more careful to just avoid
>> these exceptions?
>
> Yes, it is a CPUID instruction in secondary_startup_64 with the comment /* Check
> if nx is implemented */. It appears to be pretty important. Potentially we could
> paravirtualize the CPUID direclty (i.e. mess with the GHCB and make the vmgexit)
> instead of taking the #VC, but I'm not sure that's a great idea.

What TDX does here is actually pretty nice. It doesn't generate a #VE
for these.

But seriously, is it even *possible* to spin up a SEV-SNP VM what
doesn't have NX?

> There's a couple of other CPUIDs that are called in early_identify_cpu between
> when x86_virt_bits is set to 48 and when it is set to its real value (IIUC, it
> may be set to 57 if 5 level paging is enabled), which could potentially cause
> spurious failures in those later calls.

That should be easy enough to fix.

These things where we initialize a value and then write over it are
always fragile. Let's just make one function that does it right and
does it once.

Totally untested patch attached.diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0ba1067f4e5f1..6dafe052d1756 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1099,17 +1099,32 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
void get_cpu_address_sizes(struct cpuinfo_x86 *c)
{
u32 eax, ebx, ecx, edx;
+ bool vp_bits_from_cpuid = true;

- if (c->extended_cpuid_level >= 0x80000008) {
+ if (!cpu_has(c, X86_FEATURE_CPUID) ||
+ (c->extended_cpuid_level < 0x80000008))
+ vp_bits_from_cpuid = false;
+
+ if (vp_bits_from_cpuid) {
cpuid(0x80000008, &eax, &ebx, &ecx, &edx);

c->x86_virt_bits = (eax >> 8) & 0xff;
c->x86_phys_bits = eax & 0xff;
+ } else {
+ if (IS_ENABLED(CONFIG_X86_64)) {
+ c->x86_clflush_size = 64;
+ c->x86_phys_bits = 36;
+ c->x86_virt_bits = 48;
+ } else {
+ c->x86_clflush_size = 32;
+ c->x86_virt_bits = 32;
+ c->x86_phys_bits = 32;
+
+ if (cpu_has(c, X86_FEATURE_PAE) ||
+ cpu_has(c, X86_FEATURE_PSE36))
+ c->x86_phys_bits = 36;
+ }
}
-#ifdef CONFIG_X86_32
- else if (cpu_has(c, X86_FEATURE_PAE) || cpu_has(c, X86_FEATURE_PSE36))
- c->x86_phys_bits = 36;
-#endif
c->x86_cache_bits = c->x86_phys_bits;
}

@@ -1539,15 +1554,6 @@ static void __init cpu_parse_early_param(void)
*/
static void __init early_identify_cpu(struct cpuinfo_x86 *c)
{
-#ifdef CONFIG_X86_64
- c->x86_clflush_size = 64;
- c->x86_phys_bits = 36;
- c->x86_virt_bits = 48;
-#else
- c->x86_clflush_size = 32;
- c->x86_phys_bits = 32;
- c->x86_virt_bits = 32;
-#endif
c->x86_cache_alignment = c->x86_clflush_size;

memset(&c->x86_capability, 0, sizeof(c->x86_capability));
@@ -1561,7 +1567,6 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
cpu_detect(c);
get_cpu_vendor(c);
get_cpu_cap(c);
- get_cpu_address_sizes(c);
setup_force_cpu_cap(X86_FEATURE_CPUID);
cpu_parse_early_param();

@@ -1577,6 +1582,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
setup_clear_cpu_cap(X86_FEATURE_CPUID);
}

+ get_cpu_address_sizes(c);
+
setup_force_cpu_cap(X86_FEATURE_ALWAYS);

cpu_set_bug_bits(c);