Re: AMD erratum 665 on f15h processor?

From: Andrew Randrianasulu
Date: Sun Dec 17 2017 - 22:10:05 EST


Ð ÑÐÐÐÑÐÐÐÐ ÐÑ Sunday 17 December 2017 23:52:05 ÐÑ ÐÐÐÐÑÐÐÐ:
> On Sun, Dec 17, 2017 at 12:04:28PM +0300, Andrew Randrianasulu wrote:
> > Hello!
> >
> > I was trying to investigate why all my old kernels can't be booted on my
> > relatively new machine. Kernels 4.10+ naturally boot - I use 4.14.3 right
> > now - but old kernels die early ...
> >
> > After some digging I found this
> > https://patchwork.kernel.org/patch/9311567/
> >
> > Patch talk about family 12h, but my machine has this CPU:
> >
> > [ 0.056000] smpboot: CPU0: AMD FX(tm)-4300 Quad-Core Processor
> > (family: 0x15, model: 0x2, stepping: 0x0)
> > [ 0.056000] Performance Events: Fam15h core perfctr, AMD PMU driver.
>
> Yes, your machine is not affected by that erratum. So far so good.
>
> The rest of your mail I have hard time understanding: you're talking
> about old kernels not booting on a new machine but then you paste a qemu
> 32-bit guest kernel boot log and after that I'm lost.
>
> Perhaps you should try again by explaining in detail what exactly you're
> trying to do and how exactly you're going about doing that...

Hi, Borislav!

I was trying to boot few self-made liveCD/DVDs - they use self-compiled kernels
in 3.2-4.2 range. None of those old disks boots in qemu if I set it to cpu
type 'host'. I have whole collection of old kernels since 2011, and none work
anymore ! Even older CD with 2.6.23.something plainly rebooted after kernel and
initrd were loaded by isolinux on physical machine! But 2.6.27.9 worked at
least in qemu (not really want to reboot machine due to some stuff in tmpfs).
So, because 4.2.0-i486 was my previous failsafe kernel, and it most likely
will not work anymore - I guess I will use 4.12.0-x64.. I was just trying to
find any change explaining this error, and your fix was closer I was able to
find in this time interval (2015-2017). May be it was just some unrelated
purely software bug in amd detection code.. I spend some time trying to figure
out how to copy/paste from qemu, finally -curses interface worked.

I think I missed this misbehavior because I mostly used just qemu, without -cpu
host (but with -enable-kvm), so it worked without problems.

When I first got this machine in early 2017 I already had 4.9+ as one of
possible kernels in lilo menu, so, when 4.2 failed I quickly booted new kernel,
and forgot about it. Lately I compiled 4.12 for using it on friend's machine
with new AMD videocard - but default in syslinux/isolinux was still set to
4.2.0, and it worked on another AMD machine. Few days ago i decided to make
new 'live backup' of my running system, and while playing with new quemu
discovered this oddity.

Still, for me it raises interesting question: as far as I understand qemu's BIOS
(SeaBIOS) doesn't set all those cpu-specific workarounds/fixes - but with
qemu -cpu host guest kernel will see nearly exact cpu model, and will try to
apply (or not, assuming BIOS/firmware already set everything correctly?) some
fixups, or at least run some detection code? Of course I can just compile new
kernel with those checks disabled, but older kernels already compiled ... and
disabling those workarounds will lead to crashes later on, so having runtime
disable for them is not good idea ?

Not sure if I will able to get real boot log from physical machine boot - I
don't think I compiled those old kernels with any way to store early
oops/panic ..:/

Thanks for answering and sorry for possible false positive bug report.