MCE bug?

From: Rui Wang
Date: Tue Jun 16 2015 - 22:15:20 EST


Hi Boris & Tony,

While injecting MCEs using einj, I encountered a panic:

[ 0.305697] mce: CPU supports 22 MCE banks
[ 0.310288] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000100
[ 0.319057] IP: [<ffffffff8107d0f2>] __queue_work+0x32/0x370
[ 0.325398] PGD 0
[ 0.327656] Oops: 0000 [#1] SMP

...

[ 0.484045] Call Trace:
[ 0.486780] [<ffffffff8107d66b>] queue_work_on+0x2b/0x50
[ 0.492821] [<ffffffff8102e019>] mce_schedule_work.part.16+0x29/0x30
[ 0.500020] [<ffffffff8102f0d9>] machine_check_poll+0x249/0x260
[ 0.506733] [<ffffffff8102f123>] __mcheck_cpu_init_generic+0x33/0x100
[ 0.514018] [<ffffffff81030061>] mcheck_cpu_init+0x161/0x4b0
[ 0.520443] [<ffffffff81016095>] identify_cpu+0x365/0x450
[ 0.526576] [<ffffffff81b6144c>] identify_boot_cpu+0x10/0x7e
[ 0.532994] [<ffffffff81b614ee>] check_bugs+0x9/0x2d
[ 0.538643] [<ffffffff81b5b0a7>] start_kernel+0x469/0x495
[ 0.544771] [<ffffffff81b5aa2e>] ? set_init_arg+0x55/0x55
[ 0.550900] [<ffffffff81b5a120>] ? early_idt_handlers+0x120/0x120
[ 0.557805] [<ffffffff81b5a5ca>] x86_64_start_reservations+0x2a/0x2c
[ 0.565001] [<ffffffff81b5a709>] x86_64_start_kernel+0x13d/0x14c

It happened after the machine rebooted (due to an injected fatal error). It tried to find leftover banks and then called mce_schedule_work() in machine_check_poll(), but it seemed too early and system_wq wasn't allocated yet, thus the NULL pointer.

Is it a known problem? I'm based on Linux 4.1.0-rc3-7.

Thanks
Rui


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/