Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

From: Ilia Mirkin
Date: Tue Jul 11 2017 - 13:51:43 EST


Some details that may be useful in analysis of the bug:

1. lspci -nn -d 10de:
2. What displays, if any, you have plugged into the NVIDIA board when
this happens?
3. Any boot parameters, esp relating to ACPI, PM, or related?

Cheers,

-ilia

On Tue, Jul 11, 2017 at 1:32 PM, Mike Galbraith <efault@xxxxxx> wrote:
> Greetings,
>
> I met $subject in master-rt post drm merge, but taking the config
> (attached) to virgin v4.12-10624-g9967468c0a10, it's reproducible.
>
> KERNEL: vmlinux-4.12.0.g9967468-preempt.gz
> DUMPFILE: vmcore
> CPUS: 8
> DATE: Tue Jul 11 18:55:28 2017
> UPTIME: 00:02:03
> LOAD AVERAGE: 3.43, 1.39, 0.52
> TASKS: 467
> NODENAME: homer
> RELEASE: 4.12.0.g9967468-preempt
> VERSION: #155 SMP PREEMPT Tue Jul 11 18:18:11 CEST 2017
> MACHINE: x86_64 (3591 Mhz)
> MEMORY: 16 GB
> PANIC: "BUG: unable to handle kernel paging request at ffffffffa022990f"
> PID: 4658
> COMMAND: "kworker/u16:26"
> TASK: ffff8803c6068f80 [THREAD_INFO: ffff8803c6068f80]
> CPU: 7
> STATE: TASK_RUNNING (PANIC)
>
> crash> bt
> PID: 4658 TASK: ffff8803c6068f80 CPU: 7 COMMAND: "kworker/u16:26"
> #0 [ffffc900039f76a0] machine_kexec at ffffffff810481fc
> #1 [ffffc900039f76f0] __crash_kexec at ffffffff81109e3a
> #2 [ffffc900039f77b0] crash_kexec at ffffffff8110adc9
> #3 [ffffc900039f77c8] oops_end at ffffffff8101d059
> #4 [ffffc900039f77e8] no_context at ffffffff81055ce5
> #5 [ffffc900039f7838] do_page_fault at ffffffff81056c5b
> #6 [ffffc900039f7860] page_fault at ffffffff81690a88
> [exception RIP: report_bug+93]
> RIP: ffffffff8167227d RSP: ffffc900039f7918 RFLAGS: 00010002
> RAX: ffffffffa0229905 RBX: ffffffffa020af0f RCX: 0000000000000001
> RDX: 0000000000000907 RSI: ffffffffa020af11 RDI: ffffffffffff98f6
> RBP: ffffc900039f7a58 R8: 0000000000000001 R9: 00000000000003fc
> R10: ffffffff81a01906 R11: ffff8803f84711f8 R12: ffffffffa02231fb
> R13: 0000000000000260 R14: 0000000000000004 R15: 0000000000000006
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #7 [ffffc900039f7910] report_bug at ffffffff81672248
> #8 [ffffc900039f7938] fixup_bug at ffffffff8101af85
> #9 [ffffc900039f7950] do_trap at ffffffff8101b0d9
> #10 [ffffc900039f79a0] do_error_trap at ffffffff8101b190
> #11 [ffffc900039f7a50] invalid_op at ffffffff8169063e
> [exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335]
> RIP: ffffffffa020af0f RSP: ffffc900039f7b00 RFLAGS: 00010086
> RAX: ffffffffa04fa100 RBX: ffff8803f9550800 RCX: 0000000000000001
> RDX: ffffffffa0228a58 RSI: 0000000000000001 RDI: ffffffffa022321b
> RBP: ffffc900039f7b80 R8: 0000000000000000 R9: ffffffffa020adc0
> R10: ffffffffa048a1b0 R11: ffff8803f84711f8 R12: 0000000000000001
> R13: ffff8803f8471000 R14: ffffc900039f7b94 R15: ffffc900039f7bd0
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #12 [ffffc900039f7b18] gf119_head_vblank_put at ffffffffa04422f9 [nouveau]
> #13 [ffffc900039f7b88] drm_get_last_vbltimestamp at ffffffffa020ad91 [drm]
> #14 [ffffc900039f7ba8] drm_update_vblank_count at ffffffffa020b3e1 [drm]
> #15 [ffffc900039f7c10] drm_vblank_disable_and_save at ffffffffa020bbe9 [drm]
> #16 [ffffc900039f7c40] drm_crtc_vblank_off at ffffffffa020c3c0 [drm]
> #17 [ffffc900039f7cb0] nouveau_display_fini at ffffffffa048a4d6 [nouveau]
> #18 [ffffc900039f7ce0] nouveau_display_suspend at ffffffffa048ac4f [nouveau]
> #19 [ffffc900039f7d00] nouveau_do_suspend at ffffffffa047e5ec [nouveau]
> #20 [ffffc900039f7d38] nouveau_pmops_suspend at ffffffffa047e77d [nouveau]
> #21 [ffffc900039f7d50] pci_pm_suspend at ffffffff813b1ff0
> #22 [ffffc900039f7d80] dpm_run_callback at ffffffff814c4dbd
> #23 [ffffc900039f7db8] __device_suspend at ffffffff814c5a61
> #24 [ffffc900039f7e30] async_suspend at ffffffff814c5cfa
> #25 [ffffc900039f7e48] async_run_entry_fn at ffffffff81091683
> #26 [ffffc900039f7e70] process_one_work at ffffffff810882bc
> #27 [ffffc900039f7eb0] worker_thread at ffffffff8108854a
> #28 [ffffc900039f7f10] kthread at ffffffff8108e387
> #29 [ffffc900039f7f50] ret_from_fork at ffffffff8168fa85
> crash> gdb list *drm_calc_vbltimestamp_from_scanoutpos+335
> 0xffffffffa020af0f is in drm_calc_vbltimestamp_from_scanoutpos (drivers/gpu/drm/drm_vblank.c:608).
> 603 /* If mode timing undefined, just return as no-op:
> 604 * Happens during initial modesetting of a crtc.
> 605 */
> 606 if (mode->crtc_clock == 0) {
> 607 DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe);
> 608 WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
> 609
> 610 return false;
> 611 }
> 612
> crash> gdb list *report_bug+93
> 0xffffffff8167227d is in report_bug (lib/bug.c:177).
> 172 return BUG_TRAP_TYPE_WARN;
> 173
> 174 /*
> 175 * Since this is the only store, concurrency is not an issue.
> 176 */
> 177 bug->flags |= BUGFLAG_DONE;
> 178 }
> 179 }
> 180
> 181 if (warning) {
> crash>