AMD erratum 665 on f15h processor?

From: Andrew Randrianasulu
Date: Sun Dec 17 2017 - 04:13:27 EST


Hello!

I was trying to investigate why all my old kernels can't be booted on my
relatively new machine. Kernels 4.10+ naturally boot - I use 4.14.3 right now -
but old kernels die early ...

After some digging I found this
https://patchwork.kernel.org/patch/9311567/

Patch talk about family 12h, but my machine has this CPU:

[ 0.056000] smpboot: CPU0: AMD FX(tm)-4300 Quad-Core Processor (family: 0x15,
model: 0x2, stepping: 0x0)
[ 0.056000] Performance Events: Fam15h core perfctr, AMD PMU driver.


Because fix applied unconditionally it probably helps me, so please don't remove
it.

fail log from qemu and kernel 4.2 attached


.text : 0xc0100000 - 0xc046ceb7 (3507 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Hierarchical RCU implementation.
Build-time adjustment of leaf fanout to 32.
RCU restricting CPUs from NR_CPUS=16 to nr_cpu_ids=1.
RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=1
NR_IRQS:2304 nr_irqs:256 16
Console: colour VGA+ 80x60
console [tty0] enabled
clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns:
1911260
4467 ns
tsc: Fast TSC calibration failed
tsc: Unable to calibrate against PIT
tsc: HPET/PMTIMER calibration failed
tsc: Marking TSC unstable due to could not calculate TSC khz
Calibrating delay loop... 1253.37 BogoMIPS (lpj=2506752)
pid_max: default: 32768 minimum: 301
ACPI: Core revision 20150619
ACPI: All ACPI Tables successfully acquired
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
Initializing cgroup subsys net_cls
general protection fault: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-i486 #7
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.11.0-0-g63451fca1
3-prebuilt.qemu-project.org 04/01/2014
task: c05dba40 ti: c05d4000 task.ti: c05d4000
EIP: 0060:[<c010ec47>] EFLAGS: 00210202 CPU: 0
EIP is at cpu_has_amd_erratum+0x23/0xb2
EAX: 00210bf7 EBX: 00000001 ECX: c0010140 EDX: c0470b2c
ESI: c0630d00 EDI: c0470b30 EBP: c05d5f24 ESP: c05d5f14
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: ffc77000 CR3: 006d2000 CR4: 00040690
Stack:
02008140 00000000 c0630d00 00000000 c05d5f70 c010f446 000000d0 c05d5f5c
c01e0571 00000010 0000001e 00000000 00000000 00000009 00000010 00000000
c0630d00 00000000 c05d5f70 c010d74d 00000020 c0630d00 c0630d8b c05d5f9c
Call Trace:
[<c010f446>] init_amd+0x4e8/0x662
[<c01e0571>] ? kmem_cache_alloc_trace+0xbe/0xc8
[<c010d74d>] ? get_cpu_cap+0x127/0x12c
[<c010d936>] identify_cpu+0x1e4/0x366
[<c01e044c>] ? kmem_cache_alloc+0x90/0xf7
[<c01c7869>] ? kmem_cache_create+0x118/0x15b
[<c063f1ea>] identify_boot_cpu+0x10/0x99
[<c018fb35>] ? __delayacct_tsk_init+0x15/0x28
[<c063f2a6>] check_bugs+0x9/0x39
[<c0638ae3>] start_kernel+0x3a3/0x3b3
[<c063854d>] ? set_init_arg+0x52/0x52
[<c06382b8>] i386_start_kernel+0x82/0x86
Code: e0 eb 5d c0 89 e5 5d c3 55 89 e5 57 56 89 c6 53 51 8b 1a 8d 7a 04 81 fb
ff
ff 00 00 77 54 8b 40 2c f6 c4 02 74 4c b9 40 01 01 c0 <0f> 32 89 45 f0 89 d8
89
d1 99 39 ca 77 39 72 05 3b 5d f0 73 32
EIP: [<c010ec47>] cpu_has_amd_erratum+0x23/0xb2 SS:ESP 0068:c05d5f14
---[ end trace 8bfd5e6fa0a4fcb2 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task!

well, because this bug apparently fixed and fix propogated to -stable it
shouldn't concern me too much, but may be someone in the future will rearrange
those checks and assume only some old AMD CPUs were affected ... so, I leave
this message.

qemu cmd line:
qemu-system-x86_64 -M
q35 -enable-kvm -cdrom /dev/shm/slax_16_12_2017_test.iso -m 512 -soundhw
es1370 -cpu host -device sga -curses

-cpu host really important here. I used VGA mode 6 (vga=6) blindly for getting
maximized output.