Re: BUG: corrupted list in freeary

From: Dmitry Vyukov
Date: Fri Nov 30 2018 - 13:06:55 EST


On Fri, Nov 30, 2018 at 5:58 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Thu, Nov 29, 2018 at 9:13 AM, Manfred Spraul
> <manfred@xxxxxxxxxxxxxxxx> wrote:
>> Hello together,
>>
>> On 11/27/18 4:52 PM, syzbot wrote:
>>
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit: e195ca6cb6f2 Merge branch 'for-linus' of git://git.kernel...
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=10d3e6a3400000
>>
>> From the console output:
>>
>> 20:36:14 executing program 4:
>> semget$private(0x12000000, 0x39d0, 0x0)
>>
>>
>> I don't understand the 0x12000000.
>>
>> What does that mean? What is the actual syscall?
>
> Hi Manfred,
>
> The syscall is semget with the first argument 0x12000000.
>
>
>>
>> Is 0x39d0 the number of semaphores in the array, i.e. create ~13.000
>> semaphores?
>
> If the second argument of 0x39d0 relates to creation of 0x39d0
> semaphores, then yes.
>
>
>
>> kernel config: https://syzkaller.appspot.com/x/.config?x=73e2bc0cb6463446
>> dashboard link: https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909
>> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+c92d3646e35bc5d1a909@xxxxxxxxxxxxxxxxxxxxxxxxx
>>
>> input: syz1 as /devices/virtual/input/input670
>> input: syz1 as /devices/virtual/input/input671
>> list_del corruption. prev->next should be ffff8881dae2cdb8, but was
>> 0000000000100000
>> ------------[ cut here ]------------
>> kernel BUG at lib/list_debug.c:53!
>> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
>> CPU: 0 PID: 6194 Comm: syz-executor5 Not tainted 4.20.0-rc3+ #348
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> RIP: 0010:__list_del_entry_valid.cold.1+0x48/0x4a lib/list_debug.c:51
>> Code: d0 60 88 e8 b2 31 d2 fd 0f 0b 48 89 de 48 c7 c7 00 d2 60 88 e8 a1 31
>> d2 fd 0f 0b 48 89 de 48 c7 c7 a0 d1 60 88 e8 90 31 d2 fd <0f> 0b 48 89 d9 48
>> c7 c7 60 d2 60 88 e8 7f 31 d2 fd 0f 0b 48 89 f1
>> RSP: 0018:ffff8881848fee80 EFLAGS: 00010286
>> RAX: 0000000000000054 RBX: ffff8881dae2cdb8 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: ffffffff8165eaf5 RDI: 0000000000000005
>> RBP: ffff8881848fee98 R08: ffff8881848f26c0 R09: 0000000000000006
>> R10: 0000000000000000 R11: ffff8881848f26c0 R12: ffff8881c3173a00
>> R13: ffff8881be118118 R14: ffff8881848ff280 R15: dffffc0000000000
>> FS: 00000000020b2940(0000) GS:ffff8881dae00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000625208 CR3: 00000001c10d3000 CR4: 00000000001406f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>> __list_del_entry include/linux/list.h:117 [inline]
>> list_del include/linux/list.h:125 [inline]
>> unlink_queue ipc/sem.c:786 [inline]
>>
>> Unlink_queue means transfer all waiting threads to the wake-q.
>>
>> There are 2*(1+<semaphores in the array>) linked lists in an array.
>>
>> And this fails, because one linked list contains 0x100000 instead of a real
>> pointer.
>>
>> I could not find any semop() in the log --> all lists must be empty.
>>
>> Actually, the lists were initialized in newary(), and then never touched.
>>
>> freeary+0xbd1/0x1a40 ipc/sem.c:1160
>>
>> Free a semaphore array
>>
>> free_ipcs+0x9f/0x1c0 ipc/namespace.c:112
>> sem_exit_ns+0x20/0x40 ipc/sem.c:237
>> free_ipc_ns ipc/namespace.c:120 [inline]
>>
>> Free all ipc ids in the name space
>>
>> put_ipc_ns+0x66/0x180 ipc/namespace.c:152
>> free_nsproxy+0xcf/0x220 kernel/nsproxy.c:180
>>
>> Free the name space
>>
>> switch_task_namespaces+0xb3/0xd0 kernel/nsproxy.c:229
>> exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234
>> do_exit+0x1ad1/0x26d0 kernel/exit.c:866
>> do_group_exit+0x177/0x440 kernel/exit.c:970
>> get_signal+0x8b0/0x1980 kernel/signal.c:2517
>> do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
>> exit_to_usermode_loop+0x2e5/0x380 arch/x86/entry/common.c:162
>> prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>> syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>> do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> RIP: 0033:0x410fa0
>>
>> This is time code 604.599748 in the console output:
>>
>> [ 604.599748] RIP: 0033:0x410fa0
>>
>>
>> Questions:
>>
>> 1) What is this?
>>
>> [ 600.924691] entry_SYSCALL_64_after_hwframe+0x49/0xbe^M
>> [ 600.929872] RIP: 0033:0x7f3e597d0120^M
>> [ 600.933576] Code: Bad RIP value.^M
>> [ 600.936920] RSP: 002b:00007ffc2d83e008 EFLAGS: 00000246 ORIG_RAX:
>> 0000000000000002^M
>> [ 600.944608] RAX: ffffffffffffffda RBX: 000055ca2995b436 RCX:
>> 00007f3e597d0120^M
>> [ 600.951856] RDX: 00007ffc2d83e244 RSI: 0000000000080000 RDI:
>> 00007ffc2d83e220^M
>> [ 600.959107] RBP: 000055ca2995b1e0 R08: 0000000000000000 R09:
>> 000055ca2995b099^M
>> [ 600.966355] R10: 0000000000000000 R11: 0000000000000246 R12:
>> 0000000000000001^M
>> [ 600.973628] R13: 000055ca2995b090 R14: 000055ca2995b190 R15:
>> 00007ffc2d83e220^M
>>
>> Isn't this a kernel stack overrun?
>>
>> RSP: 0x..83e008. Assuming 8 kB kernel stack, and 8 kB alignment, we have
>> used up everything.
>
> I don't exact answer, that's just the kernel output that we captured
> from console.
>
> FWIW with KASAN stacks are 16K:
> https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/page_64_types.h#L10
>
>
> But it looks like the OOM may have been caused by a page fault, so the
> IP/SP are legitimately in user-space, e.g. in the same log:
>
> [ 443.339635] __do_page_fault+0x5e8/0xe60
> [ 443.343683] ? trace_hardirqs_off+0xb8/0x310
> [ 443.348079] do_page_fault+0xf2/0x7e0
> [ 443.351867] ? vmalloc_sync_all+0x30/0x30
> [ 443.356002] ? error_entry+0x70/0xd0
> [ 443.359702] ? trace_hardirqs_off_caller+0xbb/0x310
> [ 443.364702] ? trace_hardirqs_on_caller+0xc0/0x310
> [ 443.369620] ? syscall_return_slowpath+0x5e0/0x5e0
> [ 443.374537] ? trace_hardirqs_off_thunk+0x1a/0x1c
> [ 443.379390] ? trace_hardirqs_on_caller+0x310/0x310
> [ 443.384397] ? __bpf_trace_preemptirq_template+0x30/0x30
> [ 443.389837] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
> [ 443.395366] ? prepare_exit_to_usermode+0x291/0x3b0
> [ 443.400374] ? page_fault+0x8/0x30
> [ 443.403901] ? trace_hardirqs_off_thunk+0x1a/0x1c
> [ 443.408729] ? page_fault+0x8/0x30
> [ 443.412255] page_fault+0x1e/0x30
> [ 443.415693] RIP: 0033:0x43d8e8
> [ 443.418884] Code: Bad RIP value.
> [ 443.422229] RSP: 002b:000000c43a053490 EFLAGS: 00010206
>
>
>
>
>> - How/where are namespaces used by the bot?
>>
>> I.e. what triggered the namespace exit?
>
> It creates whole set of namespaces per test process:
> https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1523
>
> but then each test can also execute unshare as well.
>
> Namespace exits usually happen when process terminates.
> In the reported stack it seems to be the case:
>
> exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:234
> do_exit+0x1ad1/0x26d0 kernel/exit.c:866
> do_group_exit+0x177/0x440 kernel/exit.c:970
> get_signal+0x8b0/0x1980 kernel/signal.c:2517
> do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
> exit_to_usermode_loop+0x2e5/0x380 arch/x86/entry/common.c:162
>
>
>> - There are ~370 calls to semget(), most with large (>10.000) semaphores in
>> the arrays.
>>
>> Starting from [442.544635], the OOM killer starts to kill processes.
>>
>> Is this as intended?
>
> Well, generally everything except for kernel crashes is expected.
>
> We actually sandbox it with memcg quite aggressively:
> https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L2159
> But it seems to manage to either break the limits, or cause some
> massive memory leaks. The nature of that is yet unknown.
>
>
>> - Which stress tests are enabled? By chance, I found:
>>
>> [ 433.304586] FAULT_INJECTION: forcing a failure.^M
>> [ 433.304586] name fail_page_alloc, interval 1, probability 0, space 0,
>> times 0^M
>> [ 433.316471] CPU: 1 PID: 19653 Comm: syz-executor4 Not tainted 4.20.0-rc3+
>> #348^M
>> [ 433.323841] Hardware name: Google Google Compute Engine/Google Compute
>> Engine, BIOS Google 01/01/2011^M
>>
>> I need some more background, then I can review the code.
>
> What exactly do you mean by "Which stress tests"?
> Fault injection is enabled. Also random workload from userspace.
>
>
>> Right now, I would put it into my "unknown syzcaller finding" folder.


Since it happened only once, it can well be a previous silent memory
corruption. But we also seen hundreds of subtle data races that ever
fired few times or just once, but turned out to be very real bugs.
Hard to say.