Re: AMD erratum 665 on f15h processor?

From: Andrew Randrianasulu
Date: Tue Dec 19 2017 - 00:31:03 EST


Ð ÑÐÐÐÑÐÐÐÐ ÐÑ Tuesday 19 December 2017 00:05:40 Borislav Petkov ÐÐÐÐÑÐÐ(Ð):
> When you git reply, please hit reply-to-all in your mail client so that
> mailing lists get CCed too.

ok.

>
> On Mon, Dec 18, 2017 at 07:54:52PM +0300, Andrew Randrianasulu wrote:
> > Ð ÑÐÐÐÑÐÐÐÐ ÐÑ Monday 18 December 2017 16:22:15 ÐÑ ÐÐÐÐÑÐÐÐ:
> > > + kvm ML.
> > >
> > > On Mon, Dec 18, 2017 at 06:01:21AM +0300, Andrew Randrianasulu wrote:
> > > > Ð ÑÐÐÐÑÐÐÐÐ ÐÑ Sunday 17 December 2017 23:52:05 ÐÑ ÐÐÐÐÑÐÐÐ:
> > > > > On Sun, Dec 17, 2017 at 12:04:28PM +0300, Andrew Randrianasulu wrote:
> > > > > > Hello!
> > > > > >
> > > > > > I was trying to investigate why all my old kernels can't be
> > > > > > booted on my relatively new machine. Kernels 4.10+ naturally boot
> > > > > > - I use 4.14.3 right now - but old kernels die early ...
> > > > > >
> > > > > > After some digging I found this
> > > > > > https://patchwork.kernel.org/patch/9311567/
> > > > > >
> > > > > > Patch talk about family 12h, but my machine has this CPU:
> > > > > >
> > > > > > [ 0.056000] smpboot: CPU0: AMD FX(tm)-4300 Quad-Core Processor
> > > > > > (family: 0x15, model: 0x2, stepping: 0x0)
> > > > > > [ 0.056000] Performance Events: Fam15h core perfctr, AMD PMU
> > > > > > driver.
> > > > >
> > > > > Yes, your machine is not affected by that erratum. So far so good.
> > > > >
> > > > > The rest of your mail I have hard time understanding: you're
> > > > > talking about old kernels not booting on a new machine but then you
> > > > > paste a qemu 32-bit guest kernel boot log and after that I'm lost.
> > > > >
> > > > > Perhaps you should try again by explaining in detail what exactly
> > > > > you're trying to do and how exactly you're going about doing
> > > > > that...
> > > >
> > > > Hi, Borislav!
> > > >
> > > > I was trying to boot few self-made liveCD/DVDs - they use
> > > > self-compiled kernels in 3.2-4.2 range. None of those old disks boots
> > > > in qemu if I set it to cpu type 'host'. I have whole collection of
> > > > old kernels since 2011, and none work anymore ! Even older CD with
> > > > 2.6.23.something plainly rebooted after kernel and initrd were loaded
> > > > by isolinux on physical machine! But 2.6.27.9 worked at least in qemu
> > > > (not really want to reboot machine due to some stuff in tmpfs). So,
> > > > because 4.2.0-i486 was my previous failsafe kernel, and it most
> > > > likely will not work anymore - I guess I will use 4.12.0-x64.. I was
> > > > just trying to find any change explaining this error, and your fix
> > > > was closer I was able to find in this time interval (2015-2017). May
> > > > be it was just some unrelated purely software bug in amd detection
> > > > code.. I spend some time trying to figure out how to copy/paste from
> > > > qemu, finally -curses interface worked.
> > > >
> > > > I think I missed this misbehavior because I mostly used just qemu,
> > > > without -cpu host (but with -enable-kvm), so it worked without
> > > > problems.
> > >
> > > So -cpu host means:
> > >
> > > x86 host KVM processor with all supported host features
> > > (only available in KVM mode)
> > >
> > > which would theoretically mean that those guest kernel configs
> > > shouldn't boot on the baremetal box either, if they fail on the guest.
> > >
> > > But who knows what's happening.
> > >
> > > You can give me a guest kernel .config of a kernel which fails along
> > > with the exact qemu cmdline to try out here.
> >
> > .config attached.
> >
> > for reproducting just launch qemu like this:
> >
> > qemu-system-i386 -kernel /home/admin/slax-build/boot/vmlinuz -cpu
> > host --enable-kvm (just tried).
> >
> > Of course replace path to kernel image with your own. I can also attach
> > binary image, but I think it will be of little use for you.....
>
> Nah, I built it using your .config.
>
> So my guest stops very early in the BIOS with
>
> "Failed to allocate space for phdrs
>
> -- System halted."
>
> Then I looked at this:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=114671
>
> and there's a patch
>
> https://bugzilla.kernel.org/attachment.cgi?id=209601&action=diff&collapsed=
>&headers=1&format=raw


Thanks, looks like I will have more fun building 32-bit kernel, because I
already updated binutils

>
> With it, it booted a bit further. But I still couldn't see any output.
>
> So I booted with my cmdline to see more output and it did say:
>
> general protection fault: 0000 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-i486+ #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1
> 04/01/2014 task: c05b9a80 ti: c05b2000 task.ti: c05b2000
> EIP: 0060:[<c010e390>] EFLAGS: 00210293 CPU: 0
> EIP is at cpu_has_amd_erratum+0x24/0xb0
> EAX: 00210bf7 EBX: 00000001 ECX: c0010140 EDX: c044ccf4
> ESI: c0616900 EDI: c044ccf8 EBP: c05b3f68 ESP: c05b3f58
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> CR0: 8005003b CR2: ffc77000 CR3: 006ae000 CR4: 00040690
> Stack:
> 02008140 00000000 c0616900 00000000 c05b3fa8 c010ec8b f5001d80 0000001e
> 00000000 00000000 00000009 00000010 00000000 c0616900 00000000 c05b3fa8
> c010cf58 c0616900 c0616900 c061695c c05b3fc8 c010d156 c061698b c061695c
> Call Trace:
> [<c010ec8b>] init_amd+0x5ee/0x631
> [<c010cf58>] ? get_cpu_cap+0x121/0x126
> [<c010d156>] identify_cpu+0x1f9/0x37d
> [<c0624a18>] identify_boot_cpu+0xd/0x80
> [<c0624abd>] check_bugs+0x8/0x35
> [<c061ea42>] start_kernel+0x32a/0x339
> [<c061e2c2>] i386_start_kernel+0x8c/0x90
> Code: cf 5b c0 89 e5 5d c3 55 89 e5 57 56 53 51 89 c6 8b 1a 8d 7a 04 81 fb
> ff ff 00 00 77 57 8b 40 2c 0f ba e0 09 73 4e b9 40 01 01 c0 <0f> 32 89 45
> f0 89 d8 89 d1 99 39 ca 77 3b 72 05 3b 5d f0 73 34 EIP: [<c010e390>]
> cpu_has_amd_erratum+0x24/0xb0 SS:ESP 0068:c05b3f58 ---[ end trace
> 7fb9e71b486a229a ]---
> Kernel panic - not syncing: Attempted to kill the idle task!
> ---[ end Kernel panic - not syncing: Attempted to kill the idle task!
>
> Which is exactly like the splat you've posted and that fails:
>
> Code: cf 5b c0 89 e5 5d c3 55 89 e5 57 56 53 51 89 c6 8b 1a 8d 7a 04 81 fb
> ff ff 00 00 77 57 8b 40 2c 0f ba e0 09 73 4e b9 40 01 01 c0 <0f> 32 89 45
> f0 89 d8 89 d1 99 39 ca 77 3b 72 05 3b 5d f0 73 34 All code
> ========
> 0: cf iret
> 1: 5b pop %rbx
> 2: c0 89 e5 5d c3 55 89 rorb $0x89,0x55c35de5(%rcx)
> 9: e5 57 in $0x57,%eax
> b: 56 push %rsi
> c: 53 push %rbx
> d: 51 push %rcx
> e: 89 c6 mov %eax,%esi
> 10: 8b 1a mov (%rdx),%ebx
> 12: 8d 7a 04 lea 0x4(%rdx),%edi
> 15: 81 fb ff ff 00 00 cmp $0xffff,%ebx
> 1b: 77 57 ja 0x74
> 1d: 8b 40 2c mov 0x2c(%rax),%eax
> 20: 0f ba e0 09 bt $0x9,%eax
> 24: 73 4e jae 0x74
> 26: b9 40 01 01 c0 mov $0xc0010140,%ecx
> 2b:* 0f 32 rdmsr <-- trapping instruction
> 2d: 89 45 f0 mov %eax,-0x10(%rbp)
> 30: 89 d8 mov %ebx,%eax
> 32: 89 d1 mov %edx,%ecx
> 34: 99 cltd
> 35: 39 ca cmp %ecx,%edx
> 37: 77 3b ja 0x74
> 39: 72 05 jb 0x40
> 3b: 3b 5d f0 cmp -0x10(%rbp),%ebx
> 3e: 73 34 jae 0x74
>
> because it tries to read from a non-existent MSR - 0xc0010140 - and
> maybe it is because of the -cpu host emulation or so but those MSRs do
> get virtualized, see
>
> 2b036c6b861d ("KVM: SVM: Add support for AMD's OSVW feature in guests")

Thanks again, patch "Add support from AMD's OSVW feature in guests" answered my
question about virtualizing somewhat buggy CPUs.

>
> but I'd refer to the kvm/qemu people to explain what the deal here
> exactly is.
>
> What I do, is use -cpu Opteron_G5 which is also F15h and that works.
> Oh, and I'd use 64-bit kernels - 32-bit is not really being tested as
> extensively.

-cpu Opteron_G5 works here, too.


>
> HTH.