[Bug reporting] Radeon and GPU lockup

From: Patrick Dung
Date: Tue May 16 2017 - 02:52:39 EST


On Kernel 4.10.14. The X windows screen garbage.
At that time, I have two different users login on two different
virtual consoles. Both are using X windows. One of them is idle (X
windows screen locked).
May 16 01:08:39 home kernel: [25854.797258] WARNING: CPU: 9 PID: 2480 at drivers/gpu/drm/radeon/radeon_object.c:84 radeon_ttm_bo_destroy+0xf6/0x100 [radeon]
May 16 01:08:39 home kernel: [25854.797259] Modules linked in: ebtable_filter ebtables ip6_tables mpt3sas raid_class mptctl mptbase vmnet(OE) ppdev parport_pc parport fuse vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OE) bridge stp llc cfg80211 rfkill xt_conntrack xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nct6775 hwmon_vid lm92 xfs dm_raid btrfs intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt intel_uncore snd_hda_codec_realtek iTCO_vendor_support mxm_wmi snd_hda_codec_hdmi snd_hda_codec_generic intel_rapl_perf snd_usb_audio snd_hda_intel snd_hda_codec snd_usbmidi_lib joydev snd_rawmidi snd_hda_core snd_hwdep
May 16 01:08:39 home kernel: [25854.797281] snd_seq ses snd_seq_device enclosure pcspkr ipmi_ssif scsi_transport_sas snd_pcm i2c_i801 snd_timer snd lpc_ich soundcore shpchp ipmi_devintf ipmi_msghandler wmi acpi_power_meter target_core_mod tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq amdkfd amd_iommu_v2 radeon raid1 drm_kms_helper ttm igb crc32c_intel drm ptp pps_core dca megaraid_sas i2c_algo_bit fjes vhost_net tun vhost macvtap macvlan
May 16 01:08:39 home kernel: [25854.797299] CPU: 9 PID: 2480 Comm: systemd-logind Tainted: G W OE 4.10.14-200.fc25.x86_64 #1
May 16 01:08:39 home kernel: [25854.797300] Hardware name: (REDICATED)
May 16 01:08:39 home kernel: [25854.797301] Call Trace:
May 16 01:08:39 home kernel: [25854.797303] dump_stack+0x63/0x86
May 16 01:08:39 home kernel: [25854.797305] __warn+0xcb/0xf0
May 16 01:08:39 home kernel: [25854.797307] warn_slowpath_null+0x1d/0x20
May 16 01:08:39 home kernel: [25854.797320] radeon_ttm_bo_destroy+0xf6/0x100 [radeon]
May 16 01:08:39 home kernel: [25854.797324] ttm_bo_release_list+0xcb/0x210 [ttm]
May 16 01:08:39 home kernel: [25854.797325] ? dma_fence_context_alloc+0x20/0x20
May 16 01:08:39 home kernel: [25854.797329] ttm_bo_release+0x198/0x240 [ttm]
May 16 01:08:39 home kernel: [25854.797332] ttm_bo_unref+0x24/0x30 [ttm]
May 16 01:08:39 home kernel: [25854.797344] radeon_bo_unref+0x39/0x70 [radeon]
May 16 01:08:39 home kernel: [25854.797358] radeon_gem_object_free+0x57/0x70 [radeon]
May 16 01:08:39 home kernel: [25854.797365] drm_gem_object_free+0x29/0x70 [drm]
May 16 01:08:39 home kernel: [25854.797371] drm_gem_object_unreference_unlocked+0x3a/0xa0 [drm]
May 16 01:08:39 home kernel: [25854.797378] drm_gem_object_handle_unreference_unlocked+0x65/0xb0 [drm]
May 16 01:08:39 home kernel: [25854.797385] drm_gem_object_release_handle+0x53/0x90 [drm]
May 16 01:08:39 home kernel: [25854.797388] idr_for_each+0xb0/0x110
May 16 01:08:39 home kernel: [25854.797395] ? drm_gem_object_handle_unreference_unlocked+0xb0/0xb0 [drm]
May 16 01:08:39 home kernel: [25854.797402] drm_gem_release+0x20/0x30 [drm]
May 16 01:08:39 home kernel: [25854.797409] drm_release+0x34c/0x3a0 [drm]
May 16 01:08:39 home kernel: [25854.797411] __fput+0xdf/0x1e0
May 16 01:08:39 home kernel: [25854.797414] ____fput+0xe/0x10
May 16 01:08:39 home kernel: [25854.797416] task_work_run+0x80/0xa0
May 16 01:08:39 home kernel: [25854.797418] exit_to_usermode_loop+0xaa/0xb0
May 16 01:08:39 home kernel: [25854.797420] do_syscall_64+0x16d/0x180
May 16 01:08:39 home kernel: [25854.797422] entry_SYSCALL64_slow_path+0x25/0x25
May 16 01:08:39 home kernel: [25854.797424] RIP: 0033:0x7f31e5944680
May 16 01:08:39 home kernel: [25854.797425] RSP: 002b:00007ffe7be14358 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
May 16 01:08:39 home kernel: [25854.797427] RAX: 0000000000000000 RBX: 000055d4f1a7a8e0 RCX: 00007f31e5944680
May 16 01:08:39 home kernel: [25854.797259] Modules linked in: ebtable_filter ebtables ip6_tables mpt3sas raid_class mptctl mptbase vmnet(OE) ppdev parport_pc parport fuse vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OE) bridge stp llc cfg80211 rfkill xt_conntrack xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nct6775 hwmon_vid lm92 xfs dm_raid btrfs intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt intel_uncore snd_hda_codec_realtek iTCO_vendor_support mxm_wmi snd_hda_codec_hdmi snd_hda_codec_generic intel_rapl_perf snd_usb_audio snd_hda_intel snd_hda_codec snd_usbmidi_lib joydev snd_rawmidi snd_hda_core snd_hwdep
May 16 01:08:39 home kernel: [25854.797281] snd_seq ses snd_seq_device enclosure pcspkr ipmi_ssif scsi_transport_sas snd_pcm i2c_i801 snd_timer snd lpc_ich soundcore shpchp ipmi_devintf ipmi_msghandler wmi acpi_power_meter target_core_mod tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq amdkfd amd_iommu_v2 radeon raid1 drm_kms_helper ttm igb crc32c_intel drm ptp pps_core dca megaraid_sas i2c_algo_bit fjes vhost_net tun vhost macvtap macvlan
May 16 01:08:39 home kernel: [25854.797299] CPU: 9 PID: 2480 Comm: systemd-logind Tainted: G W OE 4.10.14-200.fc25.x86_64 #1
May 16 01:08:39 home kernel: [25854.797300] Hardware name: (REDICATED)
May 16 01:08:39 home kernel: [25854.797301] Call Trace:
May 16 01:08:39 home kernel: [25854.797303] dump_stack+0x63/0x86
May 16 01:08:39 home kernel: [25854.797305] __warn+0xcb/0xf0
May 16 01:08:39 home kernel: [25854.797307] warn_slowpath_null+0x1d/0x20
May 16 01:08:39 home kernel: [25854.797320] radeon_ttm_bo_destroy+0xf6/0x100 [radeon]
May 16 01:08:39 home kernel: [25854.797324] ttm_bo_release_list+0xcb/0x210 [ttm]
May 16 01:08:39 home kernel: [25854.797325] ? dma_fence_context_alloc+0x20/0x20
May 16 01:08:39 home kernel: [25854.797329] ttm_bo_release+0x198/0x240 [ttm]
May 16 01:08:39 home kernel: [25854.797332] ttm_bo_unref+0x24/0x30 [ttm]
May 16 01:08:39 home kernel: [25854.797344] radeon_bo_unref+0x39/0x70 [radeon]
May 16 01:08:39 home kernel: [25854.797358] radeon_gem_object_free+0x57/0x70 [radeon]
May 16 01:08:39 home kernel: [25854.797365] drm_gem_object_free+0x29/0x70 [drm]
May 16 01:08:39 home kernel: [25854.797371] drm_gem_object_unreference_unlocked+0x3a/0xa0 [drm]
May 16 01:08:39 home kernel: [25854.797378] drm_gem_object_handle_unreference_unlocked+0x65/0xb0 [drm]
May 16 01:08:39 home kernel: [25854.797385] drm_gem_object_release_handle+0x53/0x90 [drm]
May 16 01:08:39 home kernel: [25854.797388] idr_for_each+0xb0/0x110
May 16 01:08:39 home kernel: [25854.797395] ? drm_gem_object_handle_unreference_unlocked+0xb0/0xb0 [drm]
May 16 01:08:39 home kernel: [25854.797402] drm_gem_release+0x20/0x30 [drm]
May 16 01:08:39 home kernel: [25854.797409] drm_release+0x34c/0x3a0 [drm]
May 16 01:08:39 home kernel: [25854.797411] __fput+0xdf/0x1e0
May 16 01:08:39 home kernel: [25854.797414] ____fput+0xe/0x10
May 16 01:08:39 home kernel: [25854.797416] task_work_run+0x80/0xa0
May 16 01:08:39 home kernel: [25854.797418] exit_to_usermode_loop+0xaa/0xb0
May 16 01:08:39 home kernel: [25854.797420] do_syscall_64+0x16d/0x180
May 16 01:08:39 home kernel: [25854.797422] entry_SYSCALL64_slow_path+0x25/0x25
May 16 01:08:39 home kernel: [25854.797424] RIP: 0033:0x7f31e5944680
May 16 01:08:39 home kernel: [25854.797425] RSP: 002b:00007ffe7be14358 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
May 16 01:08:39 home kernel: [25854.797427] RAX: 0000000000000000 RBX: 000055d4f1a7a8e0 RCX: 00007f31e5944680
May 16 01:08:39 home kernel: [25854.797428] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000020
May 16 01:08:39 home kernel: [25854.797429] RBP: 000055d4f1aab8a0 R08: 000055d4f1a7b230 R09: 000055d4f1a7b230
May 16 01:08:39 home kernel: [25854.797430] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000
May 16 01:08:39 home kernel: [25854.797431] R13: 000055d4f1a74e10 R14: 0000000000000000 R15: 0000000000000000
May 16 01:08:39 home kernel: [25854.797433] ---[ end trace d11a1848b44c1081 ]---


[26087.644048] radeon 0000:02:00.0: ring 0 stalled for more than 234204msec
[26087.648140] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26088.148016] radeon 0000:02:00.0: ring 0 stalled for more than 234708msec
[26088.152007] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26088.652044] radeon 0000:02:00.0: ring 0 stalled for more than 235212msec
[26088.656065] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26089.155995] radeon 0000:02:00.0: ring 0 stalled for more than 235716msec
[26089.159901] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26089.659989] radeon 0000:02:00.0: ring 0 stalled for more than 236220msec
[26089.663798] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26090.163940] radeon 0000:02:00.0: ring 0 stalled for more than 236724msec
[26090.167698] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26090.667956] radeon 0000:02:00.0: ring 0 stalled for more than 237228msec
[26090.671653] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26091.171895] radeon 0000:02:00.0: ring 0 stalled for more than 237732msec
[26091.175475] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26091.675879] radeon 0000:02:00.0: ring 0 stalled for more than 238236msec
[26091.679383] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26092.179866] radeon 0000:02:00.0: ring 0 stalled for more than 238740msec
[26092.183376] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26092.683856] radeon 0000:02:00.0: ring 0 stalled for more than 239244msec
[26092.687362] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26093.187830] radeon 0000:02:00.0: ring 0 stalled for more than 239748msec
[26093.191337] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)
[26093.691832] radeon 0000:02:00.0: ring 0 stalled for more than 240252msec
[26093.695340] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000559c1a last fence id 0x0000000000559c48 on ring 0)