Re: Xen PV seems to be broken on Linus' tree

From: Andy Lutomirski
Date: Tue Nov 21 2017 - 23:12:09 EST


On Tue, Nov 21, 2017 at 7:33 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> I'm doing:
>
> /usr/bin/qemu-system-x86_64 -machine accel=kvm:tcg -cpu host -net none
> -nographic -kernel xen-4.8.2 -initrd './arch/x86/boot/bzImage' -m 2G
> -smp 2 -append console=com1
>
> With Linus' commit c8a0739b185d11d6e2ca7ad9f5835841d1cfc765 and the
> attached config.
>
> It dies with a bunch of sensible log lines and then:
>
> (XEN) d0v0 Unhandled invalid opcode fault/trap [#6, ec=0000]
> (XEN) domain_crash_sync called from entry.S: fault at ffff82d08023961a
> entry.o#create_bounce_frame+0x137/0x146
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-4.8.2 x86_64 debug=n Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e033:[<ffffffff811226eb>]
> (XEN) RFLAGS: 0000000000000296 EM: 1 CONTEXT: pv guest (d0v0)
> (XEN) rax: 000000000000002f rbx: ffffffff81e65a48 rcx: ffffffff81e71288
> (XEN) rdx: ffffffff81e27500 rsi: 0000000000000001 rdi: ffffffff81133f88
> (XEN) rbp: 0000000000000000 rsp: ffffffff81e03e78 r8: 0000000000000000
> (XEN) r9: 0000000000000001 r10: 0000000000000000 r11: 0000000000000000
> (XEN) r12: 0000000000000000 r13: 0000000000000001 r14: 0000000000000001
> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000003506e0
> (XEN) cr3: 000000007b0b3000 cr2: 0000000000000000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
> (XEN) Guest stack trace from rsp=ffffffff81e03e78:
> (XEN) ffffffff81e71288 0000000000000000 ffffffff811226eb 000000010000e030
> (XEN) 0000000000010096 ffffffff81e03eb8 000000000000e02b ffffffff811226eb
> (XEN) ffffffff81122c2e 0000000000000200 0000000000000000 0000000000000000
> (XEN) 0000000000000030 ffffffff81c69cf5 ffffffff81080b20 ffffffff81080560
> (XEN) 0000000000000000 ffffffff810d3741 ffffffff8107b420 ffffffff81094660
>
> Is this familiar?
>
> I'll feel really dumb if it ends up being my fault.

Nah, it's broken at least back to v4.13, and I suspect it's config
related. objdump gives me this:

ffffffff8112b0e1: e9 e8 fe ff ff jmpq
ffffffff8112afce <check_flags.part.42+0x4e>
ffffffff8112b0e6: 48 c7 c6 2d f8 c8 81 mov $0xffffffff81c8f82d,%rsi
ffffffff8112b0ed: 48 c7 c7 58 b9 c8 81 mov $0xffffffff81c8b958,%rdi
ffffffff8112b0f4: e8 13 2d 01 00 callq ffffffff8113de0c <printk>
ffffffff8112b0f9: 0f ff (bad) <-- crash here

That's "ud0", which is used by WARN. So we're probably hitting an
early warning and Xen probably has something busted with early
exception handling.

Anyone want to debug it and fix it?