Re: Boot regression caused by kauditd

From: Paul Moore
Date: Wed Apr 26 2017 - 17:20:52 EST


On Wed, Apr 26, 2017 at 4:48 PM, Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
> Hi,
>
> I don't know if anyone else already reported this, but I randomly get
> the following kernel crash during boot (Fedora 18, I know it's pretty
> old!!):
>
> [ 20.940715] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000004
> [ 20.942122] IP: get_net+0x7/0xd
> [ 20.942739] PGD 0
> [ 20.942741]
> [ 20.943370] Oops: 0002 [#1] SMP
> [ 20.943910] CPU: 0 PID: 56 Comm: kauditd Not tainted 4.11.0-rc7+ #499
> [ 20.945056] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.8.2-20150714_191134- 04/01/2014
> [ 20.946675] task: ffff8800783f8040 task.stack: ffffc90000660000
> [ 20.947687] RIP: 0010:get_net+0x7/0xd
> [ 20.948310] RSP: 0018:ffffc90000663e98 EFLAGS: 00010202
> [ 20.949409] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 9dd28c6188bbd6db
> [ 20.951093] RDX: ffffffff8111556a RSI: 0000000057539c73 RDI: 0000000000000000
> [ 20.952382] RBP: ffffc90000663e98 R08: 0000000000000000 R09: 0000000000000000
> [ 20.953658] R10: ffffc90000663e20 R11: 0000000000000000 R12: 0000000000000000
> [ 20.954946] R13: 0000000000000000 R14: ffff8800783f8040 R15: ffff8800783f8040
> [ 20.956144] FS: 0000000000000000(0000) GS:ffff88007d200000(0000)
> knlGS:0000000000000000
> [ 20.957628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 20.958672] CR2: 0000000000000004 CR3: 0000000062ea3000 CR4: 00000000000406f0
> [ 20.959943] Call Trace:
> [ 20.960380] kauditd_thread+0x60/0x1dc
> [ 20.961027] ? signal_pending_state+0x2f/0x2f
> [ 20.961771] ? auditd_reset+0x5f/0x5f
> [ 20.962407] kthread+0x107/0x10f
> [ 20.962963] ? __list_del_entry+0x22/0x22
> [ 20.963657] ret_from_fork+0x2e/0x40
> [ 20.964267] Code: 00 00 5b 41 5c 5d c3 31 c0 83 bf 84 00 00 00 00
> 55 48 89 e5 75 0c 8b 87 cc 00 00 00 2b 87 c8 00 00 00 5d c3 55 48 89
> f8 48 89 e5 <f0> ff 47 04 5d c3 f0 ff 4f 04 74 01 c3 55 48 89 e5 e8 68
> 6b 7a
> [ 20.967651] RIP: get_net+0x7/0xd RSP: ffffc90000663e98
> [ 20.968526] CR2: 0000000000000004
> [ 20.969094] ---[ end trace aab4cbd6ae264eab ]---
> [ 20.969886] Kernel panic - not syncing: Fatal exception
> [ 20.971312] Kernel Offset: disabled
> [ 20.971928] ---[ end Kernel panic - not syncing: Fatal exception
>
>
> I guess commit 264d509637d95f9404e introduced this because I never see
> this crash before pulling net-next today, but given the fact I only
> saw this once so far, I can't just revert it to verify if it is the
> case... It is probably some race condition.
>
> Please let me know if you need any other information.

Hi,

Thanks for the report, this is the only one like it that I've seen.
I'm looking at the code in Linus' tree and I'm not seeing anything
obvious ... looking at the trace above it appears that the problem is
when get_net() goes to bump the refcount and the passed net pointer is
NULL; unless I'm missing something, the only way this would happen in
kauditd_thread() is if the auditd_conn.pid value is non-zero but the
auditd_conn.net pointer is NULL.

That shouldn't happen.

The only way I could see that even being possible is if the
sock_net(NETLINK_CB(skb).sk) call in audit_receive_msg() returned NULL
which in turned was passed as the net pointer (third parameter) in the
auditd_set() call. Once again, I don't think this should ever be
possible? Am I missing something?

I realize you aren't able to reproduce this reliably, but if you do,
any chance you try it with Linus' tree? I'd like to see if we can
rule out the changes in net-next (my testing doesn't typically include
net-next patches).

--
paul moore
www.paul-moore.com