Re: drm/mgag200: doesn't work in panic context

From: Daniel Vetter
Date: Fri Jun 26 2015 - 05:28:12 EST


On Fri, Jun 26, 2015 at 9:55 AM, Rui Wang <rui.y.wang@xxxxxxxxx> wrote:
> Hi all,
>
> I'm here to report two panics which hang forever (the machine cannot reboot). It is because mgag200 doesn't work in panic context. It sleeps and allocates memory non-atomically.

This is the same for all drm drivers, the drm atomic handling with
fbcon/fbdev is totally broken. It would be serious work to fix this
properly.
-Daniel

>
> These were triggered while injecting machine checks using einj.
>
> 1)
>
> [321381.466885] ------------[ cut here ]------------
> [321381.472144] WARNING: CPU: 136 PID: 0 at kernel/time/timer.c:1098 del_timer_sync+0x36/0x60()
> [321381.481571] Modules linked in: einj(E) nmioe(E) iscsi_ibft(E) iscsi_boot_sysfs(E) af_packet(E) x86_pkg_temp_thermal(E) btrfs(E) intel_powerclamp(E) coretemp(E) kvm(E) xor(E) crct10dif_pclmul(E) raid6_pq(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) iTCO_wdt(E) iTCO_vendor_support(E) joydev(E) aesni_intel(E) lpc_ich(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) sb_edac(E) ablk_helper(E) cryptd(E) pcspkr(E) mfd_core(E) i2c_i801(E) wmi(E) edac_core(E) shpchp(E) ipmi_si(E) ipmi_msghandler(E) processor(E) acpi_pad(E) button(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) sd_mod(E) mgag200(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ehci_pci(E) ehci_hcd(E) drm_kms_helper(E) ixgbe(E) ahci(E) igb(E) mdio(E) ttm(E) libahci(E) ptp(E) i2c_algo_bit(E) usbcore(E) pps_core(E) drm(E) libata(E) megaraid_sas(E) usb_common(E) dca(E) sg(E) scsi_mod(E) autofs4(E)
> [321381.572300] CPU: 136 PID: 0 Comm: swapper/136 Tainted: G W E 4.1.0-rc8-7-default+ #4
> [321381.582117] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015
> [321381.593777] ffffffff81818089 ffff88047fc88808 ffffffff8157d67e 0000000000000000
> [321381.602184] 0000000000000000 ffff88047fc88848 ffffffff810637fa ffff88046e4bc740
> [321381.610595] ffff88047fc888a8 ffff88047fc888a8 0000000104c6f0f8 ffff88047f5cdb00
> [321381.619006] Call Trace:
> [321381.621834] <#MC> [<ffffffff8157d67e>] dump_stack+0x4c/0x65
> [321381.628358] [<ffffffff810637fa>] warn_slowpath_common+0x8a/0xc0
> [321381.635168] [<ffffffff810638ea>] warn_slowpath_null+0x1a/0x20
> [321381.641775] [<ffffffff810cb316>] del_timer_sync+0x36/0x60
> [321381.647995] [<ffffffff81582bf0>] schedule_timeout+0x150/0x280
> [321381.654611] [<ffffffff812cc9fb>] ? idr_alloc+0x7b/0xe0
> [321381.660547] [<ffffffff810c9c90>] ? internal_add_timer+0x80/0x80
> [321381.667359] [<ffffffff810cb85c>] msleep+0x3c/0x50
> [321381.672812] [<ffffffffa0145607>] mga_crtc_prepare+0x167/0x370 [mgag200]
> [321381.680404] [<ffffffffa04f38b6>] drm_crtc_helper_set_mode+0x2d6/0x530 [drm_kms_helper]
> [321381.689453] [<ffffffffa04f4896>] drm_crtc_helper_set_config+0x856/0xa70 [drm_kms_helper]
> [321381.698706] [<ffffffffa00a3318>] drm_mode_set_config_internal+0x68/0x100 [drm]
> [321381.706971] [<ffffffffa04fe8b2>] restore_fbdev_mode+0xc2/0xf0 [drm_kms_helper]
> [321381.715244] [<ffffffffa04feaa3>] drm_fb_helper_force_kernel_mode+0x73/0xb0 [drm_kms_helper]
> [321381.724780] [<ffffffffa04ff6f9>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper]
> [321381.733144] [<ffffffff8108270d>] notifier_call_chain+0x4d/0x80
> [321381.739859] [<ffffffff81082791>] atomic_notifier_call_chain+0x21/0x30
> [321381.747252] [<ffffffff815792d4>] panic+0xee/0x1f5
> [321381.752704] [<ffffffff8102d272>] mce_panic+0x1e2/0x200
> [321381.758640] [<ffffffff8102d303>] mce_timed_out+0x73/0x80
> [321381.764762] [<ffffffff8102e8a1>] do_machine_check+0x5f1/0xae0
> [321381.771377] [<ffffffff81348eaf>] ? intel_idle+0xbf/0x130
> [321381.777499] [<ffffffff81585d49>] machine_check+0x29/0x50
> [321381.783630] [<ffffffff81348eaf>] ? intel_idle+0xbf/0x130
> [321381.789760] <<EOE>> [<ffffffff81450170>] cpuidle_enter_state+0x70/0x1f0
> [321381.797457] [<ffffffff81450327>] cpuidle_enter+0x17/0x20
> [321381.803586] [<ffffffff810a5968>] cpu_startup_entry+0x308/0x390
> [321381.810297] [<ffffffff8103a203>] start_secondary+0x143/0x170
> [321381.816814] ---[ end trace 9f2a977c4a9be24e ]---
> [321381.822068] bad: scheduling from the idle thread!
> [321381.827421] CPU: 136 PID: 0 Comm: swapper/136 Tainted: G W E 4.1.0-rc8-7-default+ #4
> [321381.837238] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015
> [321381.848898] ffff88046e4bc740 ffff88047fc887a8 ffffffff8157d67e 0000000000000000
> [321381.857305] ffff88047fc95300 ffff88047fc887c8 ffffffff81093675 ffff88047fc88808
> [321381.865713] ffff88047fc95300 ffff88047fc887f8 ffffffff8108796c 0000000100000000
> [321381.874124] Call Trace:
> [321381.876951] <#MC> [<ffffffff8157d67e>] dump_stack+0x4c/0x65
> [321381.883483] [<ffffffff81093675>] dequeue_task_idle+0x35/0x50
> [321381.890001] [<ffffffff8108796c>] dequeue_task+0x5c/0x80
> [321381.896027] [<ffffffff8108c56b>] deactivate_task+0x2b/0x30
> [321381.902352] [<ffffffff8157fcea>] __schedule+0x64a/0x910
> [321381.908385] [<ffffffff8157ffee>] schedule+0x3e/0x90
> [321381.914030] [<ffffffff81582be8>] schedule_timeout+0x148/0x280
> [321381.920636] [<ffffffff812cc9fb>] ? idr_alloc+0x7b/0xe0
> [321381.926570] [<ffffffff810c9c90>] ? internal_add_timer+0x80/0x80
> [321381.933382] [<ffffffff810cb85c>] msleep+0x3c/0x50
> [321381.938835] [<ffffffffa0145607>] mga_crtc_prepare+0x167/0x370 [mgag200]
> [321381.946428] [<ffffffffa04f38b6>] drm_crtc_helper_set_mode+0x2d6/0x530 [drm_kms_helper]
> [321381.955478] [<ffffffffa04f4896>] drm_crtc_helper_set_config+0x856/0xa70 [drm_kms_helper]
> [321381.964731] [<ffffffffa00a3318>] drm_mode_set_config_internal+0x68/0x100 [drm]
> [321381.973004] [<ffffffffa04fe8b2>] restore_fbdev_mode+0xc2/0xf0 [drm_kms_helper]
> [321381.981277] [<ffffffffa04feaa3>] drm_fb_helper_force_kernel_mode+0x73/0xb0 [drm_kms_helper]
> [321381.990811] [<ffffffffa04ff6f9>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper]
> [321381.999174] [<ffffffff8108270d>] notifier_call_chain+0x4d/0x80
> [321382.005887] [<ffffffff81082791>] atomic_notifier_call_chain+0x21/0x30
> [321382.013280] [<ffffffff815792d4>] panic+0xee/0x1f5
> [321382.018731] [<ffffffff8102d272>] mce_panic+0x1e2/0x200
> [321382.024660] [<ffffffff8102d303>] mce_timed_out+0x73/0x80
> [321382.030787] [<ffffffff8102e8a1>] do_machine_check+0x5f1/0xae0
> [321382.037404] [<ffffffff81348eaf>] ? intel_idle+0xbf/0x130
> [321382.043533] [<ffffffff81585d49>] machine_check+0x29/0x50
> [321382.049665] [<ffffffff81348eaf>] ? intel_idle+0xbf/0x130
> [321382.055794] <<EOE>> [<ffffffff81450170>] cpuidle_enter_state+0x70/0x1f0
> [321382.063491] [<ffffffff81450327>] cpuidle_enter+0x17/0x20
> [321382.069623] [<ffffffff810a5968>] cpu_startup_entry+0x308/0x390
> [321382.076335] [<ffffffff8103a203>] start_secondary+0x143/0x170
> [321382.082877] ------------[ cut here ]------------
>
>
> 2)
>
> bkd04sdp:~ # [58109.056018] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
> [58109.056058] mce: [Hardware Error]: Machine check events logged
> [58110.109873] Shutting down cpus with NMI
> [58110.176778] Kernel Offset: disabled
> [58110.180667] drm_kms_helper: panic occurred, switching back to text console
> [58110.188367] mga_delay choosing mdelay...
> [58110.242399] mga_delay choosing mdelay...
> [58110.266768] ------------[ cut here ]------------
> [58110.271926] kernel BUG at mm/vmalloc.c:1335!
> [58110.276695] invalid opcode: 0000 [#1] SMP
> [58110.281289] Modules linked in: einj(E) nmioe(E) iscsi_ibft(E) iscsi_boot_sysfs(E) af_packet(E) btrfs(E) xor(E) raid6_pq(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm(E) joydev(E) iTCO_wdt(E) iTCO_vendor_support(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) sb_edac(E) ablk_helper(E) lpc_ich(E) cryptd(E) pcspkr(E) edac_core(E) mfd_core(E) i2c_i801(E) shpchp(E) wmi(E) ipmi_si(E) ipmi_msghandler(E) acpi_pad(E) processor(E) button(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) sd_mod(E) mgag200(E) syscopyarea(E) sysfillrect(E) ahci(E) ehci_pci(E) sysimgblt(E) drm_kms_helper(E) ehci_hcd(E) ixgbe(E) igb(E) ttm(E) libahci(E) mdio(E) ptp(E) usbcore(E) pps_core(E) drm(E) libata(E) i2c_algo_bit(E) usb_common(E) dca(E) megaraid_sas(E) sg(E) scsi_mod(E) autofs4(E)
> [58110.371884] CPU: 75 PID: 0 Comm: swapper/75 Tainted: G E 4.1.0-rc8-7-default+ #10
> [58110.381506] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015
> [58110.393063] task: ffff88046ea6d580 ti: ffff88046ea70000 task.ti: ffff88046ea70000
> [58110.401422] RIP: 0010:[<ffffffff81189c65>] [<ffffffff81189c65>] __get_vm_area_node+0x155/0x160
> [58110.411156] RSP: 0018:ffff88047f7284b8 EFLAGS: 00010006
> [58110.417091] RAX: 0000000080010003 RBX: 0000000091000000 RCX: ffffc90000000000
> [58110.425065] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000000eb000
> [58110.433038] RBP: ffff88047f7284f8 R08: ffffe8ffffffffff R09: 00000000ffffffff
> [58110.441010] R10: ffff880036a6a700 R11: ffff880460ab69c0 R12: 00000000910eb000
> [58110.448983] R13: 0000000000000001 R14: 0000000091000000 R15: 00000000000eb000
> [58110.456955] FS: 0000000000000000(0000) GS:ffff88047f720000(0000) knlGS:0000000000000000
> [58110.465994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [58110.472413] CR2: 00007f5bdb9ac095 CR3: 0000000001a0b000 CR4: 00000000001407e0
> [58110.480386] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [58110.488358] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [58110.496331] Stack:
> [58110.498577] ffff88047f728518 ffffc90000000000 00000000910eafff 0000000091000000
> [58110.506885] 00000000910eb000 0000000000000001 0000000091000000 00000000000eb000
> [58110.515192] ffff88047f728518 ffffffff8118ad20 00000000000000d0 ffffffffa0334f78
> [58110.523499] Call Trace:
> [58110.526229] <#MC>
> [58110.528382] [<ffffffff8118ad20>] get_vm_area_caller+0x40/0x50
> [58110.535111] [<ffffffffa0334f78>] ? ttm_mem_reg_ioremap+0xc8/0x110 [ttm]
> [58110.542607] [<ffffffff81050f18>] __ioremap_caller+0x188/0x390
> [58110.549127] [<ffffffff812d59b9>] ? find_next_bit+0x19/0x20
> [58110.555353] [<ffffffff81051177>] ioremap_wc+0x17/0x20
> [58110.561099] [<ffffffffa0334f78>] ttm_mem_reg_ioremap+0xc8/0x110 [ttm]
> [58110.568398] [<ffffffffa0335351>] ttm_bo_move_memcpy+0xd1/0x700 [ttm]
> [58110.575598] [<ffffffff811a6c55>] ? __kmalloc+0x4b5/0x4c0
> [58110.581632] [<ffffffffa01d9b48>] mgag200_bo_move+0x18/0x20 [mgag200]
> [58110.588830] [<ffffffffa0332ea0>] ttm_bo_handle_move_mem+0x260/0x590 [ttm]
> [58110.596514] [<ffffffffa03337d2>] ? ttm_bo_mem_space+0xd2/0x320 [ttm]
> [58110.603705] [<ffffffffa0333eb2>] ttm_bo_validate+0x1c2/0x1d0 [ttm]
> [58110.610711] [<ffffffff8113f681>] ? irq_work_queue+0x11/0x90
> [58110.617037] [<ffffffffa01da3d3>] mgag200_bo_push_sysram+0x93/0xe0 [mgag200]
> [58110.624915] [<ffffffffa01d5a26>] mga_crtc_do_set_base.isra.8.constprop.21+0x76/0x410 [mgag200]
> [58110.634636] [<ffffffffa01d6e02>] mga_crtc_mode_set+0x1042/0x2140 [mgag200]
> [58110.642416] [<ffffffffa01d5492>] ? mga_crtc_prepare+0x132/0x370 [mgag200]
> [58110.650106] [<ffffffffa04ec8db>] drm_crtc_helper_set_mode+0x2fb/0x530 [drm_kms_helper]
> [58110.659052] [<ffffffffa04ed896>] drm_crtc_helper_set_config+0x856/0xa70 [drm_kms_helper]
> [58110.668217] [<ffffffffa00b5318>] drm_mode_set_config_internal+0x68/0x100 [drm]
> [58110.676388] [<ffffffffa04f78b2>] restore_fbdev_mode+0xc2/0xf0 [drm_kms_helper]
> [58110.684558] [<ffffffffa04f7aa3>] drm_fb_helper_force_kernel_mode+0x73/0xb0 [drm_kms_helper]
> [58110.693989] [<ffffffffa04f86f9>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper]
> [58110.702260] [<ffffffff81081bad>] notifier_call_chain+0x4d/0x80
> [58110.708873] [<ffffffff81081c31>] atomic_notifier_call_chain+0x21/0x30
> [58110.716169] [<ffffffff8156f954>] panic+0xee/0x1f5
> [58110.721530] [<ffffffff8102d232>] mce_panic+0x1e2/0x200
> [58110.727366] [<ffffffff8102d2c3>] mce_timed_out+0x73/0x80
> [58110.733396] [<ffffffff8102e861>] do_machine_check+0x5f1/0xae0
> [58110.739915] [<ffffffff8133f52f>] ? intel_idle+0xbf/0x130
> [58110.745952] [<ffffffff8157c3c9>] machine_check+0x29/0x50
> [58110.751984] [<ffffffff8133f52f>] ? intel_idle+0xbf/0x130
> [58110.758017] <<EOE>>
> [58110.760362] [<ffffffff814467f0>] cpuidle_enter_state+0x70/0x1f0
> [58110.767275] [<ffffffff814469a7>] cpuidle_enter+0x17/0x20
> [58110.773309] [<ffffffff810a4e18>] cpu_startup_entry+0x308/0x390
> [58110.779916] [<ffffffff8103a163>] start_secondary+0x143/0x170
> [58110.786325] Code: 00 00 48 0f bd cf 83 c1 01 83 f9 0c 0f 4c c8 b0 1e 83 f9 1e 0f 4f c8 49 d3 e6 e9 f8 fe ff ff 48 89 df e8 9f a8 01 00 31 c0 eb b8 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 49 89 c8 41
> [58110.808146] RIP [<ffffffff81189c65>] __get_vm_area_node+0x155/0x160
> [58110.815257] RSP <ffff88047f7284b8>
> [58110.820218] ---[ end trace ab0c230901a0ee95 ]---
>
> Thanks
> Rui
>



--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/