Re: [PATCH v3 0/9] Parallel CPU bringup for x86_64

From: David Woodhouse
Date: Fri Jan 28 2022 - 04:55:18 EST


On Fri, 2021-12-17 at 14:55 -0600, Tom Lendacky wrote:
> On 12/17/21 2:13 PM, David Woodhouse wrote:
> > On Fri, 2021-12-17 at 13:46 -0600, Tom Lendacky wrote:
> > > There's no WARN or PANIC, just a reset. I can look to try and capture some
> > > KVM trace data if that would help. If so, let me know what events you'd
> > > like captured.
> >
> >
> > Could start with just kvm_run_exit?
> >
> > Reason 8 would be KVM_EXIT_SHUTDOWN and would potentially indicate a
> > triple fault.
>
> qemu-system-x86-24093 [005] ..... 1601.759486: kvm_exit: vcpu 112 reason shutdown rip 0xffffffff81070574 info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x80000b08 error_code 0x00000000
>
> # addr2line -e woodhouse-build-x86_64/vmlinux 0xffffffff81070574
> /root/kernels/woodhouse-build-x86_64/./arch/x86/include/asm/desc.h:272
>
> Which is: asm volatile("ltr %w0"::"q" (GDT_ENTRY_TSS*8));

So, I remain utterly bemused by this, and the Milan *guests* I have
access to can't even kexec with a stock kernel; that is also "too fast"
and they take a triple fault during the bringup in much the same way —
even without my parallel patches, and even going back to fairly old
kernels.

I wasn't able to follow up with raw serial output during the bringup to
pinpoint precisely where it happens, because the VM would tear itself
down in response to the triple fault without actually flushing the last
virtual serial output :)

It would be really useful to get access to a suitable host where I can
spawn this in qemu and watch it fail. I am suspecting a chip-specific
quirk or bug at this point.

I might suggest in the short term that we could unblock the parallel
bringup work by just not doing it for affected chips... but that won't
make existing kexec work.


Attachment: smime.p7s
Description: S/MIME cryptographic signature