Re: [BUG] x86: failed to boot a kernel on a Ryzen machine

From: Masami Hiramatsu
Date: Tue Apr 25 2017 - 10:59:13 EST


Hello,

2017-04-24 22:09 GMT+09:00 Satoru Takeuchi <satoru.takeuchi@xxxxxxxxx>:
> At Mon, 24 Apr 2017 14:48:46 +0200,
> Borislav Petkov wrote:
>>
>> On Mon, Apr 24, 2017 at 09:39:12PM +0900, Satoru Takeuchi wrote:
>> > I used the following auto-test tool (its backend is ktest).
>> >
>> > https://github.com/satoru-takeuchi/elkdat
>> >
>> > This problem can be reproduced by the following command on Ubuntu 16.04.
>> >
>> > ```
>> > $ sudo apt-get install git vagrant libvirt-bin libvirt-dev kernel-package qemu-kvm libssl-dev libncurses5-dev
>>
>> Can you minimize that reproducer? I.e, can you dump only the qemu
>> command line options from this setup?
>>
>> They're enough to be able to start a guest with your config without me
>> having to install all that other stuff.
>
> OK. Is it sufficient information?
>
> ```
> qemu-system-x86_64 -enable-kvm -name elkdat_ktest -S -machine pc-i440fx-xenial,accel=kvm,usb=off -cpu Opteron_G3,+smap,+adx,+rdseed,+bmi2,+smep,+avx2,+bmi1,+fsgsbase,+perfctr_nb,+perfctr_core,+topoext,+tce,+wdt,+skinit,+osvw,
+3dnowprefetch,+cr8legacy,+extapic,+cmp_legacy,+pdpe1gb,+fxsr_opt,+mmxext,+rdrand,+f16c,+avx,+osxsave,+xsave,+aes,+movbe,+sse4.2,+sse4.1,+fma,+ssse3,+pclmuldq,+ht,
+vme -m 512 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 12de0e96-5d01-4ab0-b0b3-165f55999960 -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-elkdat_ktest/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc
base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/
elkdat_ktest.img,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-
disk0,bootindex=1 -netdev tap,fd=26,id=h
> ostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e4:6f:3e,bus=pci.0,addr=0x5 -chardev
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
> ```

I also could reproduce this with Fedora 25 on Core i7-4770S,
with below options.

"-cpu Opteron_G3,+smap,+adx,+rdseed,+bmi2,+smep,+avx2,+bmi1,+fsgsbase,+perfctr_nb,+perfctr_core,+topoext,+tce,+wdt,+skinit,+osvw,+3dnowprefetch,+cr8legacy,+extapic,+cmp_legacy,+pdpe1gb,+fxsr_opt,+mmxext,+rdrand,+f16c,+avx,+osxsave,+xsave,+aes,+movbe,+sse4.2,+sse4.1,+fma,+ssse3,+pclmuldq,+vme -M pc -enable-kvm -M pc -enable-kvm "

And a quick investigation showed that this crash happened when I
replaced the "Opteron_G3" with "Opteron_G2", "Opteron_G1", "Westmere"
and "Nehalem" (I didn't check older than that). But I didn't see the
crash when I specify "Opteron_G4" or "Opteron_G5", or newer than
"SandyBridge".

So, I guess this maybe caused by the combinations of cpu model and
flags which must not exist, maybe qemu changes available instruction
set based on cpu model, but linux checks only cpu feature flag.

Thank you,

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>