Re: [PATCH] edac:Fix kernel panic regression in edac_mc_reset_delay_period

From: Borislav Petkov
Date: Thu May 19 2016 - 17:45:21 EST


On Thu, May 19, 2016 at 03:44:57PM -0400, Nicholas Krause wrote:
> This fixes a kernel panic regression in the function,
> edac_mc_reset_delay_period as show by this kernel panic
> trace:
> [ 58.402137] BUG: unable to handle kernel paging request at 0000000000015d10
> [ 58.410564] IP: [<ffffffff8109ab82>] queued_spin_lock_slowpath+0x132/0x170
> [ 58.418941] PGD 3ffcc8067 PUD 3ffc56067 PMD 0
> [ 58.428821] Oops: 0002 [#1] SMP
> [ 58.439076] Modules linked in: xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables
> [ 58.468176] CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1
^^^^^^^^
Ha, what is that program?

> [ 58.478878] Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013
> [ 58.488590] task: ffff8803ff9a9300 ti: ffff8803ffbf0000 task.ti: ffff8803ffbf0000
> [ 58.499562] RIP: 0010:[<ffffffff8109ab82>] [<ffffffff8109ab82>] queued_spin_lock_slowpath+0x132/0x170
> [ 58.521850] RSP: 0018:ffff8803ffbf3cf8 EFLAGS: 00010002
> [ 58.532653] RAX: 0000000000002bfe RBX: 0000000000000082 RCX: 0000000000080000
> [ 58.545334] RDX: 0000000000015d10 RSI: 00000000affd0fc4 RDI: ffffffff81d39940
> [ 58.555376] RBP: ffff88040a97b848 R08: ffff88041ed15d00 R09: 0000000000000004
> [ 58.565813] R10: 000000000000000a R11: f000000000000000 R12: ffffffff81d39940
> [ 58.577911] R13: 000000000000c940 R14: ffff8803ffbf3d48 R15: ffff8803ffbf3f28
> [ 58.588311] FS: 00007f639468f780(0000) GS:ffff88041ed00000(0000) knlGS:00000000f7743680
> [ 58.598270] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 58.609814] CR2: 0000000000015d10 CR3: 00000003ffafa000 CR4: 00000000000006e0
> [ 58.620848] Stack:
> [ 58.630118] ffffffff81774d3f 000000000000000f ffffffff810ae889 ffff88040a97b820
> [ 58.640635] ffff8803ffbf3d90 0000000000002000 ffff88040c335c00 00000000000003e8
> [ 58.652220] ffffffff810aed20 0000000000000041 0000000200000000 ffff88040a97b800
> [ 58.662230] Call Trace:
> [ 58.672043] [<ffffffff81774d3f>] ? _raw_spin_lock_irqsave+0x1f/0x30
> [ 58.682221] [<ffffffff810ae889>] ? lock_timer_base.isra.34+0x49/0x60
> [ 58.693178] [<ffffffff810aed20>] ? del_timer+0x30/0x70
> [ 58.704839] [<ffffffff81075494>] ? try_to_grab_pending+0xa4/0x140
> [ 58.715206] [<ffffffff81075569>] ? mod_delayed_work_on+0x39/0x80
> [ 58.725250] [<ffffffff81684e90>] ? edac_mc_reset_delay_period+0x30/0x50
> [ 58.735572] [<ffffffff81685865>] ? edac_set_poll_msec+0x45/0x60
> [ 58.745346] [<ffffffff8107a43b>] ? param_attr_store+0x6b/0xe0
> [ 58.755254] [<ffffffff81079975>] ? module_attr_store+0x15/0x20
> [ 58.764869] [<ffffffff811f7192>] ? kernfs_fop_write+0x142/0x190
> [ 58.774516] [<ffffffff81187a1e>] ? __vfs_write+0x1e/0xe0
> [ 58.783565] [<ffffffff811879d4>] ? __vfs_read+0xa4/0xd0
> [ 58.792437] [<ffffffff811a47a7>] ? __alloc_fd+0x37/0x160
> [ 58.801108] [<ffffffff811887f0>] ? vfs_write+0xb0/0x1b0
> [ 58.809465] [<ffffffff81189bdb>] ? SyS_write+0x4b/0xb0
> [ 58.817707] [<ffffffff81774f5f>] ? entry_SYSCALL_64_fastpath+0x17/0x93
> [ 58.825626] Code: f8 66 c7 07 01 00 c3 66 90 f3 c3 48 89 c2 c1 e8 12 48 c1 ea 0c ff c8 83 e2 30 48 98 48 81 c2 00 5d 01 00 48 03 14 c5 40 24 d1 81 <4c> 89 02 41 8b 40 08 85 c0 75 0a f3 90 41 8b 40 08 85 c0 74 f6
> [ 58.852733] RIP [<ffffffff8109ab82>] queued_spin_lock_slowpath+0x132/0x170
> [ 58.861275] RSP <ffff8803ffbf3cf8>
> [ 58.869458] CR2: 0000000000015d10
> [ 58.877632] ---[ end trace 3f286bc71cca15d1 ]---
> [ 58.885869] Kernel panic - not syncing: Fatal exception

So I see the splat but the fix does not look correct... It is more,
like, an uninitialized workqueue somewhere. How do you trigger this?

Write some values into
/sys/module/edac_core/parameters/edac_mc_poll_msec ? I guess that's that
edactest program.

Can I have your .config please?

...

Ok, I think I see it - we initialize the workqueues only when
->edac_check is defined. And you're probably using an EDAC driver which
doesn't define that function, thus the splat.

But which driver are you using? I don't see it in your module list. So
it is either compiled in or you've simply loaded edac_core.ko only.

If you want to write a proper fix, I'd give you a hint: look at
->op_state. That should be tested.

:-)

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.