Re: AMD erratum 665 on f15h processor?

From: Borislav Petkov
Date: Mon Dec 18 2017 - 08:22:33 EST

+ kvm ML.

On Mon, Dec 18, 2017 at 06:01:21AM +0300, Andrew Randrianasulu wrote:
> Ð ÑÐÐÐÑÐÐÐÐ ÐÑ Sunday 17 December 2017 23:52:05 ÐÑ ÐÐÐÐÑÐÐÐ:
> > On Sun, Dec 17, 2017 at 12:04:28PM +0300, Andrew Randrianasulu wrote:
> > > Hello!
> > >
> > > I was trying to investigate why all my old kernels can't be booted on my
> > > relatively new machine. Kernels 4.10+ naturally boot - I use 4.14.3 right
> > > now - but old kernels die early ...
> > >
> > > After some digging I found this
> > >
> > >
> > > Patch talk about family 12h, but my machine has this CPU:
> > >
> > > [ 0.056000] smpboot: CPU0: AMD FX(tm)-4300 Quad-Core Processor
> > > (family: 0x15, model: 0x2, stepping: 0x0)
> > > [ 0.056000] Performance Events: Fam15h core perfctr, AMD PMU driver.
> >
> > Yes, your machine is not affected by that erratum. So far so good.
> >
> > The rest of your mail I have hard time understanding: you're talking
> > about old kernels not booting on a new machine but then you paste a qemu
> > 32-bit guest kernel boot log and after that I'm lost.
> >
> > Perhaps you should try again by explaining in detail what exactly you're
> > trying to do and how exactly you're going about doing that...
> Hi, Borislav!
> I was trying to boot few self-made liveCD/DVDs - they use self-compiled kernels
> in 3.2-4.2 range. None of those old disks boots in qemu if I set it to cpu
> type 'host'. I have whole collection of old kernels since 2011, and none work
> anymore ! Even older CD with 2.6.23.something plainly rebooted after kernel and
> initrd were loaded by isolinux on physical machine! But worked at
> least in qemu (not really want to reboot machine due to some stuff in tmpfs).
> So, because 4.2.0-i486 was my previous failsafe kernel, and it most likely
> will not work anymore - I guess I will use 4.12.0-x64.. I was just trying to
> find any change explaining this error, and your fix was closer I was able to
> find in this time interval (2015-2017). May be it was just some unrelated
> purely software bug in amd detection code.. I spend some time trying to figure
> out how to copy/paste from qemu, finally -curses interface worked.
> I think I missed this misbehavior because I mostly used just qemu, without -cpu
> host (but with -enable-kvm), so it worked without problems.

So -cpu host means:

x86 host KVM processor with all supported host features (only available in KVM mode)

which would theoretically mean that those guest kernel configs shouldn't
boot on the baremetal box either, if they fail on the guest.

But who knows what's happening.

You can give me a guest kernel .config of a kernel which fails along
with the exact qemu cmdline to try out here.

(Leaving in the rest for reference.)

> When I first got this machine in early 2017 I already had 4.9+ as one of
> possible kernels in lilo menu, so, when 4.2 failed I quickly booted new kernel,
> and forgot about it. Lately I compiled 4.12 for using it on friend's machine
> with new AMD videocard - but default in syslinux/isolinux was still set to
> 4.2.0, and it worked on another AMD machine. Few days ago i decided to make
> new 'live backup' of my running system, and while playing with new quemu
> discovered this oddity.
> Still, for me it raises interesting question: as far as I understand qemu's BIOS
> (SeaBIOS) doesn't set all those cpu-specific workarounds/fixes - but with
> qemu -cpu host guest kernel will see nearly exact cpu model, and will try to
> apply (or not, assuming BIOS/firmware already set everything correctly?) some
> fixups, or at least run some detection code? Of course I can just compile new
> kernel with those checks disabled, but older kernels already compiled ... and
> disabling those workarounds will lead to crashes later on, so having runtime
> disable for them is not good idea ?
> Not sure if I will able to get real boot log from physical machine boot - I
> don't think I compiled those old kernels with any way to store early
> oops/panic ..:/
> Thanks for answering and sorry for possible false positive bug report.


Good mailing practices for 400: avoid top-posting and trim the reply.