Re: [RFC][PATCH 00/12] various perf fixes
From: Dmitry Vyukov
Date: Tue Jan 12 2016 - 06:26:55 EST
On Tue, Jan 12, 2016 at 12:01 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Tue, Jan 12, 2016 at 12:00 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>> On Tue, Jan 12, 2016 at 11:57 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>>> On Tue, Jan 12, 2016 at 11:11 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>>>>
>>>> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've been hunting perf bugs for the past few weeks. This resulted in this pile
>>>>> of patches, that mostly seems to work -- I still get an occasional fail so
>>>>> something is still off.
>>>>>
>>>>> But I've been sitting on this for far too long, so here goes.
>>>>>
>>>>> (I've not at all tested perf-cgroup)
>>>>>
>>>>> The code compiles in between patches, but I've not bothered trying to run/test
>>>>> any intermediate stage. There's just too many inter-related fail.
>>>>>
>>>>> Alexander, Stephane, could you guys please have a hard look at this?
>>>>>
>>>>> Andi, Dmitry, patch 10 should explain the getting stuck in
>>>>> perf_install_in_context() forever thing you both have observed.
>>>>
>>>> Btw., if there's no test failures I plan to apply and push this to Linus fairly
>>>> soon, so guys please give it all the review and testing you can.
>>>
>>>
>>> Yes, I was able to apply it on top of the latest Linus tree. Thanks.
>>> Now testing perf with these patches at full capacity.
>>
>> Hit this in a minute:
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 12905 at kernel/events/core.c:2651
>> task_ctx_sched_out+0x8d/0xa0()
>> Modules linked in:
>> CPU: 0 PID: 12905 Comm: syz-executor Not tainted 4.4.0+ #224
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> 00000000ffffffff ffff88003290f4e8 ffffffff8290d92d 0000000000000000
>> ffff8800348c5f00 ffffffff85fc05c0 ffff88003290f528 ffffffff813429e9
>> ffffffff81617ffd ffffffff85fc05c0 0000000000000a5b ffff880035c09188
>> Call Trace:
>> [< inline >] __dump_stack lib/dump_stack.c:15
>> [<ffffffff8290d92d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>> [<ffffffff813429e9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:483
>> [<ffffffff81342c19>] warn_slowpath_null+0x29/0x30 kernel/panic.c:516
>> [<ffffffff81617ffd>] task_ctx_sched_out+0x8d/0xa0 kernel/events/core.c:2651
>> [< inline >] perf_event_context_sched_out kernel/events/core.c:2550
>> [<ffffffff81622263>] __perf_event_task_sched_out+0x5d3/0x1230
>> kernel/events/core.c:2634
>> [< inline >] perf_event_task_sched_out include/linux/perf_event.h:948
>> [< inline >] prepare_task_switch kernel/sched/core.c:2613
>> [< inline >] context_switch kernel/sched/core.c:2771
>> [<ffffffff85e7d2d7>] __schedule+0xb27/0x1c50 kernel/sched/core.c:3282
>> [<ffffffff85e7eed2>] preempt_schedule_common+0x42/0x70 kernel/sched/core.c:3352
>> [<ffffffff85e7ef17>] _cond_resched+0x17/0x20 kernel/sched/core.c:4720
>> [< inline >] __wait_for_common kernel/sched/completion.c:90
>> [< inline >] wait_for_common kernel/sched/completion.c:101
>> [<ffffffff85e8158b>] wait_for_completion+0x8b/0x300
>> kernel/sched/completion.c:122
>> [<ffffffff8149255e>] __wait_rcu_gp+0x12e/0x1a0 kernel/rcu/update.c:365
>> [<ffffffff81498218>] synchronize_sched.part.58+0x88/0xb0 kernel/rcu/tree.c:3220
>> [<ffffffff814a19a3>] synchronize_sched+0xa3/0x120 kernel/rcu/tree.c:3217
>> [< inline >] synchronize_rcu include/linux/rcupdate.h:320
>> [<ffffffff817dd6be>] namespace_unlock+0xee/0x100 fs/namespace.c:1349
>> [<ffffffff817e671e>] drop_collected_mounts+0x8e/0xa0 fs/namespace.c:1762
>> [<ffffffff817eaf1c>] put_mnt_ns+0x4c/0x70 fs/namespace.c:3149
>> [<ffffffff813a7594>] free_nsproxy+0x44/0x1d0 kernel/nsproxy.c:161
>> [<ffffffff813a7992>] switch_task_namespaces+0xa2/0xc0 kernel/nsproxy.c:213
>> [<ffffffff813a79c7>] exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:218
>> [<ffffffff8134bb69>] do_exit+0x8b9/0x2b80 kernel/exit.c:749
>> [<ffffffff8134dfa8>] do_group_exit+0x108/0x330 kernel/exit.c:880
>> [<ffffffff81371214>] get_signal+0x5e4/0x1500 kernel/signal.c:2307
>> [<ffffffff81193db3>] do_signal+0x83/0x1c90 arch/x86/kernel/signal.c:712
>> [<ffffffff81006685>] exit_to_usermode_loop+0x1a5/0x210
>> arch/x86/entry/common.c:247
>> [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:282
>> [<ffffffff8100851a>] syscall_return_slowpath+0x2ba/0x340
>> arch/x86/entry/common.c:344
>> [<ffffffff85e8dba2>] int_ret_from_sys_call+0x25/0x9f
>> arch/x86/entry/entry_64.S:281
>> ---[ end trace 93985d046a082b72 ]---
>
>
> And this:
>
> ------------[ cut here ]------------
> WARNING: CPU: 3 PID: 24027 at kernel/events/core.c:2744
> ctx_sched_in+0x253/0x1390()
> Modules linked in:
> CPU: 3 PID: 24027 Comm: syz-executor Not tainted 4.4.0+ #224
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> 00000000ffffffff ffff8800620079d0 ffffffff8290d92d 0000000000000000
> ffff880065652f80 ffffffff85fc05c0 ffff880062007a10 ffffffff813429e9
> ffffffff8161bd13 ffffffff85fc05c0 0000000000000ab8 0000000000000003
> Call Trace:
> [< inline >] __dump_stack lib/dump_stack.c:15
> [<ffffffff8290d92d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
> [<ffffffff813429e9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:483
> [<ffffffff81342c19>] warn_slowpath_null+0x29/0x30 kernel/panic.c:516
> [<ffffffff8161bd13>] ctx_sched_in+0x253/0x1390 kernel/events/core.c:2744
> [<ffffffff8161ce97>] perf_event_sched_in+0x47/0xa0 kernel/events/core.c:2106
> [<ffffffff8161cfcc>] ctx_resched+0xdc/0x1d0 kernel/events/core.c:2119
> [<ffffffff8161db6b>] __perf_install_in_context+0x12b/0x1b0
> kernel/events/core.c:2150
> [<ffffffff8160c36f>] remote_function+0x12f/0x1b0 kernel/events/core.c:74
> [<ffffffff814eab83>] generic_exec_single+0x253/0x450 kernel/smp.c:156
> [<ffffffff814eaf96>] smp_call_function_single+0x216/0x340 kernel/smp.c:300
> [<ffffffff81606b7c>] task_function_call+0x11c/0x130 kernel/events/core.c:101
> [<ffffffff81611bc8>] perf_install_in_context+0x318/0x640
> kernel/events/core.c:2193
> [<ffffffff81628a20>] SYSC_perf_event_open+0x1720/0x1fa0
> kernel/events/core.c:8457
> [<ffffffff8162fad9>] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8153
> [<ffffffff85e8da36>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> ---[ end trace 3930c39e9993a16d ]---
Whole assortment:
2016/01/12 11:57:23 qemu-11: saving crash 'WARNING: CPU: 0 PID: 12905
at kernel/events/core.c:2651 task_ctx_sched_out+0x8d/0xa0()' to
crash-qemu-11-1452596243752451147
2016/01/12 11:59:50 qemu-0: saving crash 'WARNING: CPU: 3 PID: 24027
at kernel/events/core.c:2744 ctx_sched_in+0x253/0x1390()' to
crash-qemu-0-1452596390210686680
2016/01/12 12:05:30 qemu-18: saving crash 'WARNING: CPU: 2 PID: 10497
at kernel/events/core.c:2145 __perf_install_in_context+0x177/0x1b0()'
to crash-qemu-18-1452596730362194373
2016/01/12 12:08:26 qemu-12: saving crash 'WARNING: CPU: 2 PID: 26946
at kernel/events/core.c:213 event_function+0x4ba/0x590()' to
crash-qemu-12-1452596906889159952
2016/01/12 12:11:09 qemu-3: saving crash 'WARNING: CPU: 1 PID: 27863
at kernel/events/core.c:213 event_function+0x4ba/0x590()' to
crash-qemu-3-1452597069411271690
2016/01/12 12:13:38 qemu-5: saving crash 'WARNING: CPU: 1 PID: 18036
at kernel/events/core.c:213 event_function+0x4ba/0x590()' to
crash-qemu-5-1452597218740474180
2016/01/12 12:14:00 qemu-2: saving crash 'WARNING: CPU: 1 PID: 27287
at kernel/events/core.c:213 event_function+0x4ba/0x590()' to
crash-qemu-2-1452597240810824249
2016/01/12 12:14:21 qemu-13: saving crash 'WARNING: CPU: 0 PID: 24151
at kernel/events/core.c:213 event_function+0x4ba/0x590()' to
crash-qemu-13-1452597261384610361
2016/01/12 12:15:40 qemu-12: saving crash 'WARNING: CPU: 2 PID: 1802
at kernel/events/core.c:1698 event_sched_out.isra.86+0x79c/0xf40()' to
crash-qemu-12-1452597340496838168
2016/01/12 12:16:17 qemu-1: saving crash 'WARNING: CPU: 2 PID: 5975 at
kernel/events/core.c:213 event_function+0x4ba/0x590()' to
crash-qemu-1-1452597377189059606
2016/01/12 12:16:51 qemu-2: saving crash 'WARNING: CPU: 2 PID: 8948 at
kernel/events/core.c:1698 event_sched_out.isra.86+0x79c/0xf40()' to
crash-qemu-2-1452597411120105533
2016/01/12 12:17:56 qemu-1: saving crash 'WARNING: CPU: 2 PID: 30319
at kernel/events/core.c:1698 event_sched_out.isra.86+0x79c/0xf40()' to
crash-qemu-1-1452597476734445707
2016/01/12 12:18:52 qemu-7: saving crash 'WARNING: CPU: 1 PID: 31877
at kernel/events/core.c:213 event_function+0x4ba/0x590()' to
crash-qemu-7-1452597532508063377
2016/01/12 12:19:07 qemu-13: saving crash 'WARNING: CPU: 1 PID: 964 at
kernel/events/core.c:2145 __perf_install_in_context+0x177/0x1b0()' to
crash-qemu-13-1452597547020900569
Peter,
What do you think if we work on making syzkaller work for you locally?
I think it will be more efficient than testing on my side. I am ready
to provide any necessary help. If you are interested please drop a
email to syzkaller@xxxxxxxxxxxxxxxxx