Re: [BUG] x86: failed to boot a kernel on a Ryzen machine

From: Satoru Takeuchi
Date: Wed Apr 26 2017 - 07:56:49 EST


At Tue, 25 Apr 2017 23:58:50 +0900,
Masami Hiramatsu wrote:
>
> Hello,
>
> 2017-04-24 22:09 GMT+09:00 Satoru Takeuchi <satoru.takeuchi@xxxxxxxxx>:
> > At Mon, 24 Apr 2017 14:48:46 +0200,
> > Borislav Petkov wrote:
> >>
> >> On Mon, Apr 24, 2017 at 09:39:12PM +0900, Satoru Takeuchi wrote:
> >> > I used the following auto-test tool (its backend is ktest).
> >> >
> >> > https://github.com/satoru-takeuchi/elkdat
> >> >
> >> > This problem can be reproduced by the following command on Ubuntu 16.04.
> >> >
> >> > ```
> >> > $ sudo apt-get install git vagrant libvirt-bin libvirt-dev kernel-package qemu-kvm libssl-dev libncurses5-dev
> >>
> >> Can you minimize that reproducer? I.e, can you dump only the qemu
> >> command line options from this setup?
> >>
> >> They're enough to be able to start a guest with your config without me
> >> having to install all that other stuff.
> >
> > OK. Is it sufficient information?
> >
> > ```
> > qemu-system-x86_64 -enable-kvm -name elkdat_ktest -S -machine pc-i440fx-xenial,accel=kvm,usb=off -cpu Opteron_G3,+smap,+adx,+rdseed,+bmi2,+smep,+avx2,+bmi1,+fsgsbase,+perfctr_nb,+perfctr_core,+topoext,+tce,+wdt,+skinit,+osvw,
> +3dnowprefetch,+cr8legacy,+extapic,+cmp_legacy,+pdpe1gb,+fxsr_opt,+mmxext,+rdrand,+f16c,+avx,+osxsave,+xsave,+aes,+movbe,+sse4.2,+sse4.1,+fma,+ssse3,+pclmuldq,+ht,
> +vme -m 512 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 12de0e96-5d01-4ab0-b0b3-165f55999960 -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-elkdat_ktest/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc
> base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/
> elkdat_ktest.img,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-
> disk0,bootindex=1 -netdev tap,fd=26,id=h
> > ostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e4:6f:3e,bus=pci.0,addr=0x5 -chardev
> pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
> > ```
>
> I also could reproduce this with Fedora 25 on Core i7-4770S,
> with below options.
>
> "-cpu Opteron_G3,+smap,+adx,+rdseed,+bmi2,+smep,+avx2,+bmi1,+fsgsbase,+perfctr_nb,+perfctr_core,+topoext,+tce,+wdt,+skinit,+osvw,+3dnowprefetch,+cr8legacy,+extapic,+cmp_legacy,+pdpe1gb,+fxsr_opt,+mmxext,+rdrand,+f16c,+avx,+osxsave,+xsave,+aes,+movbe,+sse4.2,+sse4.1,+fma,+ssse3,+pclmuldq,+vme -M pc -enable-kvm -M pc -enable-kvm "
>
> And a quick investigation showed that this crash happened when I
> replaced the "Opteron_G3" with "Opteron_G2", "Opteron_G1", "Westmere"
> and "Nehalem" (I didn't check older than that). But I didn't see the
> crash when I specify "Opteron_G4" or "Opteron_G5", or newer than
> "SandyBridge".
>
> So, I guess this maybe caused by the combinations of cpu model and
> flags which must not exist, maybe qemu changes available instruction
> set based on cpu model, but linux checks only cpu feature flag.

Yeah, probably so. I succeeded to boot my own kernel with "Opteron_G5" model and disabling
"fma4", "tbm", and "xop" which Ryzen doesn't support.

As Paolo said in other mail, The boot also succeeded with "host-passthrough" mode rather than
"host-model". I'll wait for adding "Ryzen (or Zen?)" model to qemu.

Regards,
Satoru

>
> Thank you,
>
> --
> Masami Hiramatsu <mhiramat@xxxxxxxxxx>