Re: Yet another RX Vega hang with another kernel panic signature. WARNING: inconsistent lock state

From: Yang, Philip
Date: Thu Jan 31 2019 - 09:22:13 EST


I found same issue while debugging, I will submit patch to fix this shortly.

Philip

On 2019-01-30 10:35 p.m., Mikhail Gavrilov wrote:
> Hi folks.
> Yet another kernel panic happens while GPU again is hang:
>
> [ 1469.906798] ================================
> [ 1469.906799] WARNING: inconsistent lock state
> [ 1469.906801] 5.0.0-0.rc4.git2.2.fc30.x86_64 #1 Tainted: G C
> [ 1469.906802] --------------------------------
> [ 1469.906804] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [ 1469.906806] kworker/12:3/681 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [ 1469.906807] 00000000d591b82b
> (&(&adev->vm_manager.pasid_lock)->rlock){?...}, at:
> amdgpu_vm_get_task_info+0x23/0x80 [amdgpu]
> [ 1469.906851] {IN-HARDIRQ-W} state was registered at:
> [ 1469.906855] _raw_spin_lock+0x31/0x80
> [ 1469.906893] amdgpu_vm_get_task_info+0x23/0x80 [amdgpu]
> [ 1469.906936] gmc_v9_0_process_interrupt+0x198/0x2b0 [amdgpu]
> [ 1469.906978] amdgpu_irq_dispatch+0x90/0x1f0 [amdgpu]
> [ 1469.907018] amdgpu_irq_callback+0x4a/0x70 [amdgpu]
> [ 1469.907061] amdgpu_ih_process+0x89/0x100 [amdgpu]
> [ 1469.907103] amdgpu_irq_handler+0x22/0x50 [amdgpu]
> [ 1469.907106] __handle_irq_event_percpu+0x3f/0x290
> [ 1469.907108] handle_irq_event_percpu+0x31/0x80
> [ 1469.907109] handle_irq_event+0x34/0x51
> [ 1469.907111] handle_edge_irq+0x7c/0x1a0
> [ 1469.907114] handle_irq+0xbf/0x100
> [ 1469.907116] do_IRQ+0x61/0x120
> [ 1469.907118] ret_from_intr+0x0/0x22
> [ 1469.907121] cpuidle_enter_state+0xbf/0x470
> [ 1469.907123] do_idle+0x1ec/0x280
> [ 1469.907125] cpu_startup_entry+0x19/0x20
> [ 1469.907127] start_secondary+0x1b3/0x200
> [ 1469.907129] secondary_startup_64+0xa4/0xb0
> [ 1469.907131] irq event stamp: 5546749
> [ 1469.907133] hardirqs last enabled at (5546749):
> [<ffffffff9719112a>] ktime_get+0xfa/0x130
> [ 1469.907135] hardirqs last disabled at (5546748):
> [<ffffffff9719105b>] ktime_get+0x2b/0x130
> [ 1469.907137] softirqs last enabled at (5498318):
> [<ffffffff97e0035f>] __do_softirq+0x35f/0x46a
> [ 1469.907140] softirqs last disabled at (5497393):
> [<ffffffff970ee119>] irq_exit+0x119/0x120
> [ 1469.907141]
> other info that might help us debug this:
> [ 1469.907142] Possible unsafe locking scenario:
>
> [ 1469.907143] CPU0
> [ 1469.907144] ----
> [ 1469.907144] lock(&(&adev->vm_manager.pasid_lock)->rlock);
> [ 1469.907146] <Interrupt>
> [ 1469.907147] lock(&(&adev->vm_manager.pasid_lock)->rlock);
> [ 1469.907148]
> *** DEADLOCK ***
>
> [ 1469.907150] 2 locks held by kworker/12:3/681:
> [ 1469.907152] #0: 00000000953235a7 ((wq_completion)"events"){+.+.},
> at: process_one_work+0x1e9/0x5d0
> [ 1469.907157] #1: 0000000071a3d218
> ((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at:
> process_one_work+0x1e9/0x5d0
> [ 1469.907160]
> stack backtrace:
> [ 1469.907163] CPU: 12 PID: 681 Comm: kworker/12:3 Tainted: G
> C 5.0.0-0.rc4.git2.2.fc30.x86_64 #1
> [ 1469.907165] Hardware name: System manufacturer System Product
> Name/ROG STRIX X470-I GAMING, BIOS 1103 11/16/2018
> [ 1469.907169] Workqueue: events drm_sched_job_timedout [gpu_sched]
> [ 1469.907171] Call Trace:
> [ 1469.907176] dump_stack+0x85/0xc0
> [ 1469.907180] print_usage_bug.cold+0x1ae/0x1e8
> [ 1469.907183] ? print_shortest_lock_dependencies+0x40/0x40
> [ 1469.907185] mark_lock+0x50a/0x600
> [ 1469.907186] ? print_shortest_lock_dependencies+0x40/0x40
> [ 1469.907189] __lock_acquire+0x544/0x1660
> [ 1469.907191] ? mark_held_locks+0x57/0x80
> [ 1469.907193] ? trace_hardirqs_on_thunk+0x1a/0x1c
> [ 1469.907195] ? lockdep_hardirqs_on+0xed/0x180
> [ 1469.907197] ? trace_hardirqs_on_thunk+0x1a/0x1c
> [ 1469.907200] ? retint_kernel+0x10/0x10
> [ 1469.907202] lock_acquire+0xa2/0x1b0
> [ 1469.907242] ? amdgpu_vm_get_task_info+0x23/0x80 [amdgpu]
> [ 1469.907245] _raw_spin_lock+0x31/0x80
> [ 1469.907283] ? amdgpu_vm_get_task_info+0x23/0x80 [amdgpu]
> [ 1469.907323] amdgpu_vm_get_task_info+0x23/0x80 [amdgpu]
> [ 1469.907324] ------------[ cut here ]------------
>
>
> My kernel commit is: 62967898789d
>
>
>
> --
> Best Regards,
> Mike Gavrilov.
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>