AMDGPU crash on 5.4.7 on AMD Athlon 3000G APU

From: A L
Date: Thu Jan 02 2020 - 15:54:12 EST


Dear all,

There seems to be a regression between kernel 5.4.6 and 5.4.7. When I change from kernel 5.4.6 to kernel 5.4.7 I can no longer load the AMDGPU driver. The kernel immediately crashes with the following stack trace and errors. The system has to be hard reset to boot again.

[Â 320.086318] [drm] amdgpu kernel modesetting enabled.
[Â 320.086382] Parsing CRAT table with 1 nodes
[Â 320.086388] Creating topology SYSFS entries
[Â 320.086425] Topology: Add APU node [0x0:0x0]
[Â 320.086427] Finished initializing topology
[Â 320.086545] amdgpu 0000:06:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
[Â 320.086549] amdgpu 0000:06:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf01fffff
[Â 320.086552] amdgpu 0000:06:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xfce00000 -> 0xfce7ffff
[Â 320.086554] checking generic (e0000000 7f0000) vs hw (e0000000 10000000)
[Â 320.086557] fb0: switching to amdgpudrmfb from VESA VGA
[Â 320.086647] Console: switching to colour dummy device 80x25
[Â 320.086673] amdgpu 0000:06:00.0: vgaarb: deactivate vga console
[Â 320.086839] [drm] initializing kernel modesetting (RAVEN 0x1002:0x15D8 0x1002:0x15D8 0xCC).
[Â 320.086850] [drm] register mmio base: 0xFCE00000
[Â 320.086851] [drm] register mmio size: 524288
[Â 320.086868] [drm] add ip block number 0 <soc15_common>
[Â 320.086869] [drm] add ip block number 1 <gmc_v9_0>
[Â 320.086869] [drm] add ip block number 2 <vega10_ih>
[Â 320.086870] [drm] add ip block number 3 <psp>
[Â 320.086870] [drm] add ip block number 4 <gfx_v9_0>
[Â 320.086871] [drm] add ip block number 5 <sdma_v4_0>
[Â 320.086871] [drm] add ip block number 6 <powerplay>
[Â 320.086872] [drm] add ip block number 7 <dm>
[Â 320.086873] [drm] add ip block number 8 <vcn_v1_0>
[Â 320.112116] [drm] BIOS signature incorrect 0 0
[Â 320.112142] ATOM BIOS: 113-RAVEN2-115
[Â 320.112773] [drm] VCN decode is enabled in VM mode
[Â 320.112774] [drm] VCN encode is enabled in VM mode
[Â 320.112774] [drm] VCN jpeg decode is enabled in VM mode
[Â 320.112807] [drm] vm size is 262144 GB, 3 levels, block size is 9-bit, fragment size is 9-bit
[Â 320.112813] amdgpu 0000:06:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[Â 320.112814] amdgpu 0000:06:00.0: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[Â 320.112815] amdgpu 0000:06:00.0: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
[Â 320.112818] [drm] Detected VRAM RAM=2048M, BAR=2048M
[Â 320.112818] [drm] RAM width 128bits DDR4
[ 320.112858] [TTM] Zone kernel: Available graphics memory: 3052426 KiB
[Â 320.112858] [TTM] ZoneÂÂ dma32: Available graphics memory: 2097152 KiB
[Â 320.112859] [TTM] Initializing pool allocator
[Â 320.112861] [TTM] Initializing DMA pool allocator
[Â 320.112913] [drm] amdgpu: 2048M of VRAM memory ready
[Â 320.112915] [drm] amdgpu: 3072M of GTT memory ready.
[Â 320.112923] [drm] GART: num cpu pages 262144, num gpu pages 262144
[Â 320.113067] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[Â 320.119226] [drm] use_doorbell being set to: [true]
[Â 320.119270] amdgpu: [powerplay] hwmgr_sw_init smu backed is smu10_smu
[Â 320.121347] [drm] Found VCN firmware Version: 1.86 Family ID: 18
[Â 320.121354] [drm] PSP loading VCN firmware
[Â 320.142076] [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
[Â 320.202902] [drm] failed to load ucode id (18)
[Â 320.202904] [drm] psp command failed and response status is (0x300F)
[Â 320.205881] [drm] failed to load ucode id (19)
[Â 320.205883] [drm] psp command failed and response status is (0xF)
[Â 320.208882] [drm] failed to load ucode id (20)
[Â 320.208883] [drm] psp command failed and response status is (0xF)
[Â 320.229776] [drm] DM_PPLIB: values for F clock
[Â 320.229778] [drm] DM_PPLIB:ÂÂÂ Â0 in kHz, 3649 in mV
[Â 320.229779] [drm] DM_PPLIB:ÂÂÂ Â0 in kHz, 0 in mV
[Â 320.229780] [drm] DM_PPLIB:ÂÂÂ Â0 in kHz, 0 in mV
[Â 320.229780] [drm] DM_PPLIB:ÂÂÂ Â0 in kHz, 0 in mV
[Â 320.229797] ------------[ cut here ]------------
[Â 320.229949] WARNING: CPU: 1 PID: 5908 at drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1464 dcn_bw_update_from_pplib+0x94/0x2c0 [amdgpu]
[Â 320.229950] Modules linked in: amdgpu(+) gpu_sched ttm ip_set_hash_ip xt_state ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6table_raw ip6table_mangle xt_multiport ip6table_nat nfnetlink_log xt_limit xt_NFLOG ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter iptable_mangle xt_nat iptable_nat xt_CT iptable_raw ip_set_bitmap_port ip_set_hash_net nf_nat_pptp nf_conntrack_pptp nf_nat xt_sctp nf_conntrack_sip nf_conntrack_irc nf_conntrack_ftp nf_conntrack_h323 nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_bridge nf_conntrack nf_defrag_ipv6 ip6_tables ip_tables xt_recent xt_set ip_set nfnetlink nf_defrag_ipv4 nf_socket_ipv4 uas pinctrl_amd
[Â 320.229974] CPU: 1 PID: 5908 Comm: modprobe Not tainted 5.4.7-gentoo-test2 #3
[Â 320.229975] Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F50 11/27/2019
[Â 320.230113] RIP: 0010:dcn_bw_update_from_pplib+0x94/0x2c0 [amdgpu]
[Â 320.230116] Code: 0c 24 85 c9 74 24 8d 71 ff 48 8d 44 24 04 48 8d 54 f4 0c eb 0d 48 83 c0 08 48 39 d0 0f 84 13 01 00 00 44 8b 00 45 85 c0 75 eb <0f> 0b e8 65 3e d4 e0 4c 89 e2 be 04 00 00 00 4c 89 ef e8 a5 9b fe
[Â 320.230117] RSP: 0018:ffffc9000045b700 EFLAGS: 00010246
[Â 320.230119] RAX: ffffc9000045b704 RBX: ffff88812c700000 RCX: 0000000000000004
[Â 320.230120] RDX: ffffc9000045b724 RSI: 0000000000000003 RDI: ffff888218856350
[Â 320.230121] RBP: ffffc9000045b840 R08: 0000000000000000 R09: 00000000000003c5
[Â 320.230122] R10: 0000000000000001 R11: 0000000000000000 R12: ffffc9000045b790
[Â 320.230123] R13: ffff8881341c8980 R14: 0000000000000001 R15: 000000000000000b
[Â 320.230125] FS:Â 00007ff865e7db80(0000) GS:ffff888218840000(0000) knlGS:0000000000000000
[Â 320.230126] CS:Â 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Â 320.230127] CR2: 00007fd68de9f540 CR3: 000000012c4f8000 CR4: 00000000003406e0
[Â 320.230128] Call Trace:
[Â 320.230135]Â ? kmem_cache_alloc+0xe6/0x180
[Â 320.230271]Â dcn10_create_resource_pool+0x7d9/0xb10 [amdgpu]
[Â 320.230406]Â ? firmware_parser_create+0x6fb/0x720 [amdgpu]
[Â 320.230533]Â dc_create_resource_pool+0x21/0x100 [amdgpu]
[Â 320.230660]Â dc_create+0x206/0x680 [amdgpu]
[Â 320.230663]Â ? kmem_cache_alloc+0xe6/0x180
[Â 320.230795]Â amdgpu_dm_init+0x138/0x1c0 [amdgpu]
[Â 320.230800]Â ? common_interrupt+0xa/0xf
[Â 320.230929]Â ? phm_wait_for_register_unequal.part.0+0x44/0x70 [amdgpu]
[Â 320.231059]Â dm_hw_init+0x9/0x20 [amdgpu]
[Â 320.231191]Â amdgpu_device_init.cold+0xf47/0x129e [amdgpu]
[Â 320.231194]Â ? __alloc_pages_nodemask+0x128/0x240
[Â 320.231300]Â amdgpu_driver_load_kms+0x44/0xe0 [amdgpu]
[Â 320.231305]Â drm_dev_register+0x109/0x150
[Â 320.231410]Â amdgpu_pci_probe+0xe9/0x150 [amdgpu]
[Â 320.231414]Â ? __pm_runtime_resume+0x44/0x50
[Â 320.231417]Â local_pci_probe+0x38/0x70
[Â 320.231419]Â ? pci_match_device+0xd2/0x100
[Â 320.231422]Â pci_device_probe+0xe4/0x190
[Â 320.231425]Â really_probe+0xdf/0x290
[Â 320.231427]Â driver_probe_device+0x4b/0xc0
[Â 320.231430]Â device_driver_attach+0x4e/0x60
[Â 320.231432]Â __driver_attach+0x44/0xb0
[Â 320.231434]Â ? device_driver_attach+0x60/0x60
[Â 320.231436]Â bus_for_each_dev+0x5c/0x90
[Â 320.231438]Â bus_add_driver+0x16d/0x1c0
[Â 320.231440]Â driver_register+0x67/0xb0
[Â 320.231442]Â ? 0xffffffffa0597000
[Â 320.231444]Â do_one_initcall+0x44/0x16f
[Â 320.231447]Â ? __vunmap+0x223/0x260
[Â 320.231449]Â ? kmem_cache_alloc+0xe6/0x180
[Â 320.231452]Â do_init_module+0x51/0x200
[Â 320.231455]Â load_module+0x20d6/0x23d0
[Â 320.231458]Â ? vfs_read+0x117/0x140
[Â 320.231461]Â ? __do_sys_finit_module+0x9b/0xb0
[Â 320.231464]Â __do_sys_finit_module+0x9b/0xb0
[Â 320.231466]Â do_syscall_64+0x3d/0x100
[Â 320.231469]Â entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Â 320.231471] RIP: 0033:0x7ff865f9e289
[Â 320.231474] Code: 00 00 00 75 05 48 83 c4 18 c3 e8 c2 5f 01 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d7 4b 09 00 f7 d8 64 89 01 48
[Â 320.231475] RSP: 002b:00007ffe140b9878 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[Â 320.231476] RAX: ffffffffffffffda RBX: 000055eeb9140ab0 RCX: 00007ff865f9e289
[Â 320.231477] RDX: 0000000000000000 RSI: 000055eeb8c5533c RDI: 0000000000000005
[Â 320.231478] RBP: 0000000000040000 R08: 0000000000000000 R09: 000055eeb9140ca0
[Â 320.231479] R10: 0000000000000005 R11: 0000000000000246 R12: 000055eeb8c5533c
[Â 320.231480] R13: 0000000000000000 R14: 000055eeb9140be0 R15: 000055eeb9140ab0
[Â 320.231482] ---[ end trace 4d7f7927484d9651 ]---
[Â 320.231529] [drm] DM_PPLIB: values for DCF clock
[Â 320.231530] [drm] DM_PPLIB:ÂÂÂ Â300000 in kHz, 3649 in mV
[Â 320.231531] [drm] DM_PPLIB:ÂÂÂ Â600000 in kHz, 3974 in mV
[Â 320.231532] [drm] DM_PPLIB:ÂÂÂ Â626000 in kHz, 4174 in mV
[Â 320.231532] [drm] DM_PPLIB:ÂÂÂ Â654000 in kHz, 4325 in mV
[Â 320.237932] [drm] Display Core initialized with v3.2.48!
[Â 320.238368] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[Â 320.238369] [drm] Driver supports precise vblank timestamp query.
[Â 320.249980] [drm] VCN decode and encode initialized successfully(under SPG Mode).
[Â 320.251158] kfd kfd: Allocated 3969056 bytes on gart
[Â 320.251662] kfd kfd: Failed to resume IOMMU for device 1002:15d8
[Â 320.251891] kfd kfd: device 1002:15d8 NOT added due to errors
[Â 320.251962] [drm] Cannot find any crtc or sizes
[Â 320.252163] amdgpu 0000:06:00.0: ring gfx uses VM inv eng 0 on hub 0
[Â 320.252167] amdgpu 0000:06:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[Â 320.252169] amdgpu 0000:06:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[Â 320.252172] amdgpu 0000:06:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[Â 320.252174] amdgpu 0000:06:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[Â 320.252176] amdgpu 0000:06:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[Â 320.252178] amdgpu 0000:06:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[Â 320.252181] amdgpu 0000:06:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[Â 320.252183] amdgpu 0000:06:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[Â 320.252185] amdgpu 0000:06:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[Â 320.252186] amdgpu 0000:06:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[Â 320.252188] amdgpu 0000:06:00.0: ring vcn_dec uses VM inv eng 1 on hub 1
[Â 320.252190] amdgpu 0000:06:00.0: ring vcn_enc0 uses VM inv eng 4 on hub 1
[Â 320.252192] amdgpu 0000:06:00.0: ring vcn_enc1 uses VM inv eng 5 on hub 1
[Â 320.252194] amdgpu 0000:06:00.0: ring vcn_jpeg uses VM inv eng 6 on hub 1
[Â 320.401903] AMD-Vi: Completion-Wait loop timed out
[Â 320.542017] AMD-Vi: Completion-Wait loop timed out
[Â 320.682015] AMD-Vi: Completion-Wait loop timed out
[Â 320.822091] AMD-Vi: Completion-Wait loop timed out
[Â 320.962130] AMD-Vi: Completion-Wait loop timed out
[Â 321.088018] AMD-Vi: Completion-Wait loop timed out
[Â 321.214038] AMD-Vi: Completion-Wait loop timed out
[Â 321.263146] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=06:00.0 address=0x217879410]
[Â 322.278079] clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large:
[Â 322.278081] clocksource:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 'hpet' wd_now: 1327ac6e wd_last: 129c4f3c mask: ffffffff
[Â 322.278082] clocksource:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 'tsc' cs_now: 11ce84c06c0 cs_last: 11c76631aad mask: ffffffffffffffff
[Â 322.278084] tsc: Marking TSC unstable due to clocksource watchdog
[Â 322.369132] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[Â 322.369134] sched_clock: Marking unstable (322385637527, -16091409)<-(322450384861, -81255237)
[Â 322.734191] clocksource: Switched to clocksource hpet
[Â 336.585879] hpet: Lost 4 RTC interrupts

* The full paste is available at (1)
* lspci -vk paste is available at (2)
* kernel .config is available at (3)
* sys-kernel/linux-firmware-20191215 is installed.

The previous kernel 5.4.6 worked with no crashes. There was still the same stack trace, but no "Wait loop timed out" or "iommu ivhd0: AMD-Vi" error. Same kernel .config was used for both kernels.

The system is small headless machine with the new low-power AMD Athlon 3000G APU with integrated VEGA 3 graphics (4)
Motherboard is a Gigabyte B450M. Two intel PCIe NICs are present.

1) http://dpaste.com/0X9FWCW
2) http://dpaste.com/10R7J9H
3) http://dpaste.com/064XG5E
4) https://www.amd.com/en/products/apu/amd-athlon-3000g

Regards,
Anders