drm/mgag200: doesn't work in panic context

From: Rui Wang
Date: Fri Jun 26 2015 - 04:13:49 EST


Hi all,

I'm here to report two panics which hang forever (the machine cannot reboot). It is because mgag200 doesn't work in panic context. It sleeps and allocates memory non-atomically.

These were triggered while injecting machine checks using einj.

1)

[321381.466885] ------------[ cut here ]------------
[321381.472144] WARNING: CPU: 136 PID: 0 at kernel/time/timer.c:1098 del_timer_sync+0x36/0x60()
[321381.481571] Modules linked in: einj(E) nmioe(E) iscsi_ibft(E) iscsi_boot_sysfs(E) af_packet(E) x86_pkg_temp_thermal(E) btrfs(E) intel_powerclamp(E) coretemp(E) kvm(E) xor(E) crct10dif_pclmul(E) raid6_pq(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) iTCO_wdt(E) iTCO_vendor_support(E) joydev(E) aesni_intel(E) lpc_ich(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) sb_edac(E) ablk_helper(E) cryptd(E) pcspkr(E) mfd_core(E) i2c_i801(E) wmi(E) edac_core(E) shpchp(E) ipmi_si(E) ipmi_msghandler(E) processor(E) acpi_pad(E) button(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) sd_mod(E) mgag200(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ehci_pci(E) ehci_hcd(E) drm_kms_helper(E) ixgbe(E) ahci(E) igb(E) mdio(E) ttm(E) libahci(E) ptp(E) i2c_algo_bit(E) usbcore(E) pps_core(E) drm(E) libata(E) megaraid_sas(E) usb_common(E) dca(E) sg(E) scsi_mod(E) autofs4(E)
[321381.572300] CPU: 136 PID: 0 Comm: swapper/136 Tainted: G W E 4.1.0-rc8-7-default+ #4
[321381.582117] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015
[321381.593777] ffffffff81818089 ffff88047fc88808 ffffffff8157d67e 0000000000000000
[321381.602184] 0000000000000000 ffff88047fc88848 ffffffff810637fa ffff88046e4bc740
[321381.610595] ffff88047fc888a8 ffff88047fc888a8 0000000104c6f0f8 ffff88047f5cdb00
[321381.619006] Call Trace:
[321381.621834] <#MC> [<ffffffff8157d67e>] dump_stack+0x4c/0x65
[321381.628358] [<ffffffff810637fa>] warn_slowpath_common+0x8a/0xc0
[321381.635168] [<ffffffff810638ea>] warn_slowpath_null+0x1a/0x20
[321381.641775] [<ffffffff810cb316>] del_timer_sync+0x36/0x60
[321381.647995] [<ffffffff81582bf0>] schedule_timeout+0x150/0x280
[321381.654611] [<ffffffff812cc9fb>] ? idr_alloc+0x7b/0xe0
[321381.660547] [<ffffffff810c9c90>] ? internal_add_timer+0x80/0x80
[321381.667359] [<ffffffff810cb85c>] msleep+0x3c/0x50
[321381.672812] [<ffffffffa0145607>] mga_crtc_prepare+0x167/0x370 [mgag200]
[321381.680404] [<ffffffffa04f38b6>] drm_crtc_helper_set_mode+0x2d6/0x530 [drm_kms_helper]
[321381.689453] [<ffffffffa04f4896>] drm_crtc_helper_set_config+0x856/0xa70 [drm_kms_helper]
[321381.698706] [<ffffffffa00a3318>] drm_mode_set_config_internal+0x68/0x100 [drm]
[321381.706971] [<ffffffffa04fe8b2>] restore_fbdev_mode+0xc2/0xf0 [drm_kms_helper]
[321381.715244] [<ffffffffa04feaa3>] drm_fb_helper_force_kernel_mode+0x73/0xb0 [drm_kms_helper]
[321381.724780] [<ffffffffa04ff6f9>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper]
[321381.733144] [<ffffffff8108270d>] notifier_call_chain+0x4d/0x80
[321381.739859] [<ffffffff81082791>] atomic_notifier_call_chain+0x21/0x30
[321381.747252] [<ffffffff815792d4>] panic+0xee/0x1f5
[321381.752704] [<ffffffff8102d272>] mce_panic+0x1e2/0x200
[321381.758640] [<ffffffff8102d303>] mce_timed_out+0x73/0x80
[321381.764762] [<ffffffff8102e8a1>] do_machine_check+0x5f1/0xae0
[321381.771377] [<ffffffff81348eaf>] ? intel_idle+0xbf/0x130
[321381.777499] [<ffffffff81585d49>] machine_check+0x29/0x50
[321381.783630] [<ffffffff81348eaf>] ? intel_idle+0xbf/0x130
[321381.789760] <<EOE>> [<ffffffff81450170>] cpuidle_enter_state+0x70/0x1f0
[321381.797457] [<ffffffff81450327>] cpuidle_enter+0x17/0x20
[321381.803586] [<ffffffff810a5968>] cpu_startup_entry+0x308/0x390
[321381.810297] [<ffffffff8103a203>] start_secondary+0x143/0x170
[321381.816814] ---[ end trace 9f2a977c4a9be24e ]---
[321381.822068] bad: scheduling from the idle thread!
[321381.827421] CPU: 136 PID: 0 Comm: swapper/136 Tainted: G W E 4.1.0-rc8-7-default+ #4
[321381.837238] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015
[321381.848898] ffff88046e4bc740 ffff88047fc887a8 ffffffff8157d67e 0000000000000000
[321381.857305] ffff88047fc95300 ffff88047fc887c8 ffffffff81093675 ffff88047fc88808
[321381.865713] ffff88047fc95300 ffff88047fc887f8 ffffffff8108796c 0000000100000000
[321381.874124] Call Trace:
[321381.876951] <#MC> [<ffffffff8157d67e>] dump_stack+0x4c/0x65
[321381.883483] [<ffffffff81093675>] dequeue_task_idle+0x35/0x50
[321381.890001] [<ffffffff8108796c>] dequeue_task+0x5c/0x80
[321381.896027] [<ffffffff8108c56b>] deactivate_task+0x2b/0x30
[321381.902352] [<ffffffff8157fcea>] __schedule+0x64a/0x910
[321381.908385] [<ffffffff8157ffee>] schedule+0x3e/0x90
[321381.914030] [<ffffffff81582be8>] schedule_timeout+0x148/0x280
[321381.920636] [<ffffffff812cc9fb>] ? idr_alloc+0x7b/0xe0
[321381.926570] [<ffffffff810c9c90>] ? internal_add_timer+0x80/0x80
[321381.933382] [<ffffffff810cb85c>] msleep+0x3c/0x50
[321381.938835] [<ffffffffa0145607>] mga_crtc_prepare+0x167/0x370 [mgag200]
[321381.946428] [<ffffffffa04f38b6>] drm_crtc_helper_set_mode+0x2d6/0x530 [drm_kms_helper]
[321381.955478] [<ffffffffa04f4896>] drm_crtc_helper_set_config+0x856/0xa70 [drm_kms_helper]
[321381.964731] [<ffffffffa00a3318>] drm_mode_set_config_internal+0x68/0x100 [drm]
[321381.973004] [<ffffffffa04fe8b2>] restore_fbdev_mode+0xc2/0xf0 [drm_kms_helper]
[321381.981277] [<ffffffffa04feaa3>] drm_fb_helper_force_kernel_mode+0x73/0xb0 [drm_kms_helper]
[321381.990811] [<ffffffffa04ff6f9>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper]
[321381.999174] [<ffffffff8108270d>] notifier_call_chain+0x4d/0x80
[321382.005887] [<ffffffff81082791>] atomic_notifier_call_chain+0x21/0x30
[321382.013280] [<ffffffff815792d4>] panic+0xee/0x1f5
[321382.018731] [<ffffffff8102d272>] mce_panic+0x1e2/0x200
[321382.024660] [<ffffffff8102d303>] mce_timed_out+0x73/0x80
[321382.030787] [<ffffffff8102e8a1>] do_machine_check+0x5f1/0xae0
[321382.037404] [<ffffffff81348eaf>] ? intel_idle+0xbf/0x130
[321382.043533] [<ffffffff81585d49>] machine_check+0x29/0x50
[321382.049665] [<ffffffff81348eaf>] ? intel_idle+0xbf/0x130
[321382.055794] <<EOE>> [<ffffffff81450170>] cpuidle_enter_state+0x70/0x1f0
[321382.063491] [<ffffffff81450327>] cpuidle_enter+0x17/0x20
[321382.069623] [<ffffffff810a5968>] cpu_startup_entry+0x308/0x390
[321382.076335] [<ffffffff8103a203>] start_secondary+0x143/0x170
[321382.082877] ------------[ cut here ]------------


2)

bkd04sdp:~ # [58109.056018] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
[58109.056058] mce: [Hardware Error]: Machine check events logged
[58110.109873] Shutting down cpus with NMI
[58110.176778] Kernel Offset: disabled
[58110.180667] drm_kms_helper: panic occurred, switching back to text console
[58110.188367] mga_delay choosing mdelay...
[58110.242399] mga_delay choosing mdelay...
[58110.266768] ------------[ cut here ]------------
[58110.271926] kernel BUG at mm/vmalloc.c:1335!
[58110.276695] invalid opcode: 0000 [#1] SMP
[58110.281289] Modules linked in: einj(E) nmioe(E) iscsi_ibft(E) iscsi_boot_sysfs(E) af_packet(E) btrfs(E) xor(E) raid6_pq(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm(E) joydev(E) iTCO_wdt(E) iTCO_vendor_support(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) sb_edac(E) ablk_helper(E) lpc_ich(E) cryptd(E) pcspkr(E) edac_core(E) mfd_core(E) i2c_i801(E) shpchp(E) wmi(E) ipmi_si(E) ipmi_msghandler(E) acpi_pad(E) processor(E) button(E) dm_mod(E) ext4(E) crc16(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sr_mod(E) cdrom(E) sd_mod(E) mgag200(E) syscopyarea(E) sysfillrect(E) ahci(E) ehci_pci(E) sysimgblt(E) drm_kms_helper(E) ehci_hcd(E) ixgbe(E) igb(E) ttm(E) libahci(E) mdio(E) ptp(E) usbcore(E) pps_core(E) drm(E) libata(E) i2c_algo_bit(E) usb_common(E) dca(E) megaraid_sas(E) sg(E) scsi_mod(E) autofs4(E)
[58110.371884] CPU: 75 PID: 0 Comm: swapper/75 Tainted: G E 4.1.0-rc8-7-default+ #10
[58110.381506] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0059.R00.1501081238 01/08/2015
[58110.393063] task: ffff88046ea6d580 ti: ffff88046ea70000 task.ti: ffff88046ea70000
[58110.401422] RIP: 0010:[<ffffffff81189c65>] [<ffffffff81189c65>] __get_vm_area_node+0x155/0x160
[58110.411156] RSP: 0018:ffff88047f7284b8 EFLAGS: 00010006
[58110.417091] RAX: 0000000080010003 RBX: 0000000091000000 RCX: ffffc90000000000
[58110.425065] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 00000000000eb000
[58110.433038] RBP: ffff88047f7284f8 R08: ffffe8ffffffffff R09: 00000000ffffffff
[58110.441010] R10: ffff880036a6a700 R11: ffff880460ab69c0 R12: 00000000910eb000
[58110.448983] R13: 0000000000000001 R14: 0000000091000000 R15: 00000000000eb000
[58110.456955] FS: 0000000000000000(0000) GS:ffff88047f720000(0000) knlGS:0000000000000000
[58110.465994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[58110.472413] CR2: 00007f5bdb9ac095 CR3: 0000000001a0b000 CR4: 00000000001407e0
[58110.480386] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[58110.488358] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[58110.496331] Stack:
[58110.498577] ffff88047f728518 ffffc90000000000 00000000910eafff 0000000091000000
[58110.506885] 00000000910eb000 0000000000000001 0000000091000000 00000000000eb000
[58110.515192] ffff88047f728518 ffffffff8118ad20 00000000000000d0 ffffffffa0334f78
[58110.523499] Call Trace:
[58110.526229] <#MC>
[58110.528382] [<ffffffff8118ad20>] get_vm_area_caller+0x40/0x50
[58110.535111] [<ffffffffa0334f78>] ? ttm_mem_reg_ioremap+0xc8/0x110 [ttm]
[58110.542607] [<ffffffff81050f18>] __ioremap_caller+0x188/0x390
[58110.549127] [<ffffffff812d59b9>] ? find_next_bit+0x19/0x20
[58110.555353] [<ffffffff81051177>] ioremap_wc+0x17/0x20
[58110.561099] [<ffffffffa0334f78>] ttm_mem_reg_ioremap+0xc8/0x110 [ttm]
[58110.568398] [<ffffffffa0335351>] ttm_bo_move_memcpy+0xd1/0x700 [ttm]
[58110.575598] [<ffffffff811a6c55>] ? __kmalloc+0x4b5/0x4c0
[58110.581632] [<ffffffffa01d9b48>] mgag200_bo_move+0x18/0x20 [mgag200]
[58110.588830] [<ffffffffa0332ea0>] ttm_bo_handle_move_mem+0x260/0x590 [ttm]
[58110.596514] [<ffffffffa03337d2>] ? ttm_bo_mem_space+0xd2/0x320 [ttm]
[58110.603705] [<ffffffffa0333eb2>] ttm_bo_validate+0x1c2/0x1d0 [ttm]
[58110.610711] [<ffffffff8113f681>] ? irq_work_queue+0x11/0x90
[58110.617037] [<ffffffffa01da3d3>] mgag200_bo_push_sysram+0x93/0xe0 [mgag200]
[58110.624915] [<ffffffffa01d5a26>] mga_crtc_do_set_base.isra.8.constprop.21+0x76/0x410 [mgag200]
[58110.634636] [<ffffffffa01d6e02>] mga_crtc_mode_set+0x1042/0x2140 [mgag200]
[58110.642416] [<ffffffffa01d5492>] ? mga_crtc_prepare+0x132/0x370 [mgag200]
[58110.650106] [<ffffffffa04ec8db>] drm_crtc_helper_set_mode+0x2fb/0x530 [drm_kms_helper]
[58110.659052] [<ffffffffa04ed896>] drm_crtc_helper_set_config+0x856/0xa70 [drm_kms_helper]
[58110.668217] [<ffffffffa00b5318>] drm_mode_set_config_internal+0x68/0x100 [drm]
[58110.676388] [<ffffffffa04f78b2>] restore_fbdev_mode+0xc2/0xf0 [drm_kms_helper]
[58110.684558] [<ffffffffa04f7aa3>] drm_fb_helper_force_kernel_mode+0x73/0xb0 [drm_kms_helper]
[58110.693989] [<ffffffffa04f86f9>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper]
[58110.702260] [<ffffffff81081bad>] notifier_call_chain+0x4d/0x80
[58110.708873] [<ffffffff81081c31>] atomic_notifier_call_chain+0x21/0x30
[58110.716169] [<ffffffff8156f954>] panic+0xee/0x1f5
[58110.721530] [<ffffffff8102d232>] mce_panic+0x1e2/0x200
[58110.727366] [<ffffffff8102d2c3>] mce_timed_out+0x73/0x80
[58110.733396] [<ffffffff8102e861>] do_machine_check+0x5f1/0xae0
[58110.739915] [<ffffffff8133f52f>] ? intel_idle+0xbf/0x130
[58110.745952] [<ffffffff8157c3c9>] machine_check+0x29/0x50
[58110.751984] [<ffffffff8133f52f>] ? intel_idle+0xbf/0x130
[58110.758017] <<EOE>>
[58110.760362] [<ffffffff814467f0>] cpuidle_enter_state+0x70/0x1f0
[58110.767275] [<ffffffff814469a7>] cpuidle_enter+0x17/0x20
[58110.773309] [<ffffffff810a4e18>] cpu_startup_entry+0x308/0x390
[58110.779916] [<ffffffff8103a163>] start_secondary+0x143/0x170
[58110.786325] Code: 00 00 48 0f bd cf 83 c1 01 83 f9 0c 0f 4c c8 b0 1e 83 f9 1e 0f 4f c8 49 d3 e6 e9 f8 fe ff ff 48 89 df e8 9f a8 01 00 31 c0 eb b8 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 49 89 c8 41
[58110.808146] RIP [<ffffffff81189c65>] __get_vm_area_node+0x155/0x160
[58110.815257] RSP <ffff88047f7284b8>
[58110.820218] ---[ end trace ab0c230901a0ee95 ]---

Thanks
Rui

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/