amd apu crashes

From: Steven J Abner
Date: Tue Feb 08 2022 - 17:27:22 EST


Hi
I've been trying out kernel 5.16. Lots of amdgpu upgrades? However it seems to be
getting worse :(
On AMD Ryzen 5 2400G, elementary OS 5.1.7, Ubuntu 18.04.6 LTS, Linux 5.15.5-051505-generic, GTK 3.22.30. Background: Was using 5.16.6 when it started it's triple threat, so went back to 5.15 in panic. Previously, back in November, my first triple threat, I was on system with btrfs which destroyed my hard drive.
Rebuilt with ext4 and still trying to recreate the losses. Cant use higher Ubuntu due to still need afp to connect with mac for transfer, and elementary went even heavier with gtk, so crawls. I did find better workaround to afp, but not happy with Ubuntu's treatment of bug.
The triple threat is when monitor flashes 3 times before total lockup. The last may have been but I was ready, hit reboot before third flash, so no test on it killing my hard drive.
Guessing, it's not a true kernel problem, but gtk exploiting a weakness. Probably uninitialized pointer. But with new kernels, the crashes seem to be more frequent.
Here are the last few:
$ journalctl -o short-precise -f -k -b -3
-- Logs begin at Mon 2022-01-03 17:21:50 EST. --
Feb 05 08:37:32.229754 steven-ryzen kernel: AMD-Vi: Completion-Wait loop timed out
Feb 05 08:37:32.230639 steven-ryzen kernel: AMD-Vi: Completion-Wait loop timed out
Feb 05 08:37:32.273370 steven-ryzen kernel: AMD-Vi: Completion-Wait loop timed out
Feb 05 08:37:32.668947 steven-ryzen kernel: AMD-Vi: Completion-Wait loop timed out
Feb 05 08:37:32.794231 steven-ryzen kernel: AMD-Vi: Completion-Wait loop timed out
Feb 05 08:37:32.919503 steven-ryzen kernel: AMD-Vi: Completion-Wait loop timed out
Feb 05 08:37:33.044753 steven-ryzen kernel: AMD-Vi: Completion-Wait loop timed out
Feb 05 08:37:33.169986 steven-ryzen kernel: AMD-Vi: Completion-Wait loop timed out
Feb 05 08:37:33.295263 steven-ryzen kernel: AMD-Vi: Completion-Wait loop timed out
Feb 05 08:37:33.420514 steven-ryzen kernel: AMD-Vi: Completion-Wait loop timed out

$ journalctl -o short-precise -f -k -b -2
-- Logs begin at Mon 2022-01-03 17:21:50 EST. --
Feb 07 06:11:47.495092 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: RW: 0x0
Feb 07 06:11:47.495199 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:7 pasid:32782, for process WebKitWebProces pid 5037 thread WebKitWebP:cs0 pid 5101)
Feb 07 06:11:47.495304 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: in page starting at address 0x000080010e24d000 from IH client 0x12 (VMC)
Feb 07 06:11:47.495413 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 07 06:11:47.495520 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: Faulty UTCL2 client ID: MP1 (0x0)
Feb 07 06:11:47.495631 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 07 06:11:47.495766 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 07 06:11:47.495875 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 07 06:11:47.495987 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 07 06:11:47.496108 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: RW: 0x0

$ journalctl -o short-precise -f -k -b -1
-- Logs begin at Mon 2022-01-03 17:21:50 EST. --
Feb 07 16:49:00.229782 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: RW: 0x0
Feb 07 16:49:00.229898 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32769, for process Xorg pid 2061 thread Xorg:cs0 pid 2062)
Feb 07 16:49:00.230010 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: in page starting at address 0x0000800101955000 from IH client 0x12 (VMC)
Feb 07 16:49:00.230114 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 07 16:49:00.230220 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: Faulty UTCL2 client ID: MP1 (0x0)
Feb 07 16:49:00.230425 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 07 16:49:00.230535 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 07 16:49:00.230646 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 07 16:49:00.230771 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 07 16:49:00.230910 steven-ryzen kernel: amdgpu 0000:38:00.0: amdgpu: RW: 0x0

I haven't dealt with kernel debug for years, so please if more info needed, I probably don't remember how to get it.
If this is a bother, sorry I troubled you.
Per 'Do I have to be subscribed to post to the list?':
I wish to be personally CC'ed the answers/comments posted to the list in response to your posting, please.
Thanks Steve