amdgpu crash

From: Peter Maloney
Date: Thu Oct 06 2016 - 05:52:49 EST


Hi,

I seem to have a crash in amdgpu. It results in a black screen with
monitors in power save mode, but sysrq still works to reboot.
(is this the right place to report it...?)

It never failed this way with kernel 4.5.7, and fails every day when
idle for a long time in kernel 4.7.6.

> Oct 4 19:09:51 peter kernel: INFO: task plasmashell:3200 blocked for
> more than 120 seconds.
> Oct 4 19:09:51 peter kernel: Not tainted 4.7.6-1-grsec-kvm-host #24
> Oct 4 19:09:51 peter kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 4 19:09:51 peter kernel: plasmashell D ffffc9000318b890
> 0 3200 3091 0x00000080
> Oct 4 19:09:51 peter kernel: ffffc9000318b890 ffff880469e56540
> ffff88046da5a1c0 dad31f4a396d9287
> Oct 4 19:09:51 peter kernel: ffff880469e56548 ffff88045b4a90e8
> ffff88045b4a90e8 0000000000000001
> Oct 4 19:09:51 peter kernel: ffff880450ad4800 ffffc9000318b8a8
> ffffffff816be89b ffff880450ad4800
> Oct 4 19:09:51 peter kernel: Call Trace:
> Oct 4 19:09:51 peter kernel: [<ffffffff816be89b>] schedule+0x3b/0xa0
> Oct 4 19:09:51 peter kernel: [<ffffffffa069699b>]
> amd_sched_entity_push_job+0x6b/0xe0 [amdgpu]
> Oct 4 19:09:51 peter kernel: [<ffffffff810caac0>] ?
> wake_atomic_t_function+0xc0/0xc0
> Oct 4 19:09:51 peter kernel: [<ffffffffa06976af>]
> amdgpu_job_submit+0xaf/0x120 [amdgpu]
> Oct 4 19:09:51 peter kernel: [<ffffffffa06223d0>]
> amdgpu_vm_bo_update_mapping+0x2e0/0x4f0 [amdgpu]
> Oct 4 19:09:51 peter kernel: [<ffffffffa0622702>]
> amdgpu_vm_bo_split_mapping+0x122/0x150 [amdgpu]
> Oct 4 19:09:51 peter kernel: [<ffffffffa0623887>]
> amdgpu_vm_bo_update+0x157/0x270 [amdgpu]
> Oct 4 19:09:51 peter kernel: [<ffffffffa06132ab>]
> amdgpu_gem_va_update_vm+0x1bb/0x1e0 [amdgpu]
> Oct 4 19:09:51 peter kernel: [<ffffffff81368201>] ? __list_add+0x11/0x90
> Oct 4 19:09:51 peter kernel: [<ffffffffa0614352>]
> amdgpu_gem_va_ioctl+0x242/0x310 [amdgpu]
> Oct 4 19:09:51 peter kernel: [<ffffffffa06d5840>] ?
> amdgpu_exit+0x188d/0x90fd1 [amdgpu]
> Oct 4 19:09:51 peter kernel: [<ffffffffa052a982>]
> drm_ioctl+0x362/0x6c0 [drm]
> Oct 4 19:09:51 peter kernel: [<ffffffffa0614110>] ?
> amdgpu_gem_metadata_ioctl+0x1e0/0x1e0 [amdgpu]
> Oct 4 19:09:51 peter kernel: [<ffffffffa05f9057>]
> amdgpu_drm_ioctl+0x47/0x90 [amdgpu]
> Oct 4 19:09:51 peter kernel: [<ffffffff812132e0>]
> do_vfs_ioctl+0xd0/0xb30
> Oct 4 19:09:51 peter kernel: [<ffffffff81221ad9>] ? __fget+0x79/0xb0
> Oct 4 19:09:51 peter kernel: [<ffffffff81213dbd>] sys_ioctl+0x7d/0xa0
> Oct 4 19:09:51 peter kernel: [<ffffffff810035f6>]
> do_syscall_64+0x56/0xf0
> Oct 4 19:09:51 peter kernel: [<ffffffff816c33be>]
> entry_SYSCALL64_slow_path+0x25/0x25


> root@peter:~ # uname -a
> Linux peter 4.7.6-1-grsec-kvm-host #24 SMP PREEMPT Tue Oct 4 12:43:34
> CEST 2016 x86_64 GNU/Linux

> xorg-server 1.18.4-1
> xf86-video-amdgpu 1.1.2-1
> plasma-desktop 5.7.5-1
> plasma-framework 5.26.0-1
> plasma-workspace 5.7.5-1


> root@peter:~ # lspci -k | grep -E "VGA|in use" | grep -A1 "VGA"
> 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> [AMD/ATI] Bonaire XTX [Radeon R7 260X/360]
> Kernel driver in use: vfio-pci
> --
> 07:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> [AMD/ATI] Tobago PRO [Radeon R7 360 / R9 360 OEM] (rev 81)
> Kernel driver in use: amdgpu

(Only the 2nd one is in use here, which is the primary one [numbers are
backwards order on this machine]. Other has vfio-pci bound early via an
initcpio hook; and it was not used by qemu since rebooting)