Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2]

From: Paolo Bonzini
Date: Mon Feb 29 2016 - 10:46:08 EST




On 26/02/2016 18:12, Linus Torvalds wrote:
> It does feel like CPU state corruption - either due to a qemu bug, or
> due to some odd trap/interrupt handling bug of ours.
>
> Or possibly a CPU/microcode bug. You wouldn't happen to run this on an
> AMD Piledriver-based CPU with the 0x06000832 microcode?
>
> Because we do have a pending qemu-related bug-report that turned out
> to be a AMD microcode problem with NMI delivery. Looking at that bug
> report, it actually looks rather similar - also due to a confused RIP.

Just a couple notes about QEMU and KVM...

First, if you suspect a QEMU or KVM bug, feel free to Cc me.

Second, people generally say "QEMU" because that's what the SMBIOS info
says, but it's helpful to distinguish the two. Nowadays it's almost
always KVM, but at least Intel was running tests on QEMU's binary
translator (no VT-x, no KVM) because it supported SMEP and SMAP long
before hardware was common. Similarly, the next version of QEMU should
support PKE so perhaps someone will be using it again to play with PKE.

Third, suspected QEMU bugs almost always end up being QEMU bugs, but KVM
bugs rarely show up as random crashes in a Linux guest. KVM does
_really_ little these days unless the host is swapping. (The puzzling
aspect of the NMI microcode issue was that it was a plausible KVM bug,
but such a KVM bug would have either showed up also on Intel, or if
AMD-only also on other kinds of interrupts than NMIs). On the other
hand, if your host is swapping and you hit a KVM bug, it's the host that
would crash, not the guest.

Thanks,

Paolo