/proc/<pid>/status & task struct locking
From: Dave Jones
Date: Fri Apr 15 2016 - 12:49:17 EST
I've been chasing this bug which shows up occasionally.
We've seen it happen all the way back to 3.10, and more recently, on 4.0
and the code affected doesn't look to have changed much since..
BUG: unable to handle kernel NULL pointer dereference at 0000000000000019
IP: [<ffffffff81442bc0>] vsnprintf+0x60/0x580
PGD 42c42a067 PUD 28a441067 PMD 0
Oops: 0000 [#1] SMP
CPU: 13 PID: 2364235 Comm: ps Not tainted 4.0.9 #1
task: ffff880445f16880 ti: ffff8801ac52c000 task.ti: ffff8801ac52c000
RIP: 0010:[<ffffffff81442bc0>] [<ffffffff81442bc0>] vsnprintf+0x60/0x580
RSP: 0018:ffff8801ac52fc08 EFLAGS: 00010286
RAX: 0000000000000019 RBX: ffff88033a46720d RCX: ffffffff81981e31
RDX: 0000000000000025 RSI: ffff8801ac52fc40 RDI: ffffffff81981e18
RBP: ffff8801ac52fc78 R08: ffffffff81981e18 R09: 00000000ffffffff
R10: 0000000000000000 R11: ffff8801ac52fa9e R12: ffff88033a468000
R13: ffff8801ac52fcb0 R14: 000000000001f199 R15: ffffffff81981e18
FS: 00007f907481a880(0000) GS:ffff88046fba0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000019 CR3: 0000000342931000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
0000000000000017 0000000000000001 0000000000001000 ffffffff81977b7f
ffff88033a46720d 0000000000000df3 ffffffff811d7b39 0000000000000000
0000000000000296 ffff8801cffcc780 ffff8801cffcc780 ffff88046d7c7200
Call Trace:
[<ffffffff811d7b39>] ? seq_vprintf+0x39/0x70
[<ffffffff811d7b35>] seq_vprintf+0x35/0x70
[<ffffffff811d7bad>] seq_printf+0x3d/0x40
[<ffffffff8121b675>] proc_pid_status+0x735/0x7c0
[<ffffffff811bdc32>] ? path_cleanup+0x42/0x60
[<ffffffff812168f4>] proc_single_show+0x54/0xa0
[<ffffffff811d746a>] seq_read+0xea/0x370
[<ffffffff811b4cc0>] __vfs_read+0x20/0x60
[<ffffffff811b4d86>] vfs_read+0x86/0x140
[<ffffffff811b4e86>] SyS_read+0x46/0xb0
[<ffffffff8175930e>] ? int_check_syscall_exit_work+0x34/0x3d
[<ffffffff817590f2>] system_call_fastpath+0x12/0x17
Code: 89 cd 49 01 fc 0f 82 18 03 00 00 48 89 7d b0 41 0f b6 07 0f 1f 84 00 00 00 00 00 84 c0 74 43 48 8d 75 c8 4c 89 ff e8 30 d4 ff ff <0f> b6 55 c8 48 63 c8 4d 8d 34 0f 80 fa 07 0f 87 4c 02 00 00 ff
RIP [<ffffffff81442bc0>] vsnprintf+0x60/0x580
RSP <ffff8801ac52fc08>
My first thought on this was "ps is reading /proc/pid/status while the
process is exiting/crashing"
Looking at proc_pid_status(), I see a bunch of the functions it calls
take task_lock(), but there are also a bunch of bare task struct dereferences.
Wrapping those in task_lock()/task_unlock()'s might narrow the window some,
but it still seems like there would be a small opportunity for this
same scenario to exist.
What puzzles me, is that I can't make this bug reproduce for the life of me,
even by adding sleeps in proc_pid_status, so I'm wondering if I'm way off base.
Is there some other locking at play here protecting the task struct that I'm overlooking ?
Dave