Re: [RFC][PATCH 00/12] various perf fixes

From: Dmitry Vyukov
Date: Tue Jan 12 2016 - 06:01:51 EST


On Tue, Jan 12, 2016 at 12:00 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Tue, Jan 12, 2016 at 11:57 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>> On Tue, Jan 12, 2016 at 11:11 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>>>
>>> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've been hunting perf bugs for the past few weeks. This resulted in this pile
>>>> of patches, that mostly seems to work -- I still get an occasional fail so
>>>> something is still off.
>>>>
>>>> But I've been sitting on this for far too long, so here goes.
>>>>
>>>> (I've not at all tested perf-cgroup)
>>>>
>>>> The code compiles in between patches, but I've not bothered trying to run/test
>>>> any intermediate stage. There's just too many inter-related fail.
>>>>
>>>> Alexander, Stephane, could you guys please have a hard look at this?
>>>>
>>>> Andi, Dmitry, patch 10 should explain the getting stuck in
>>>> perf_install_in_context() forever thing you both have observed.
>>>
>>> Btw., if there's no test failures I plan to apply and push this to Linus fairly
>>> soon, so guys please give it all the review and testing you can.
>>
>>
>> Yes, I was able to apply it on top of the latest Linus tree. Thanks.
>> Now testing perf with these patches at full capacity.
>
> Hit this in a minute:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 12905 at kernel/events/core.c:2651
> task_ctx_sched_out+0x8d/0xa0()
> Modules linked in:
> CPU: 0 PID: 12905 Comm: syz-executor Not tainted 4.4.0+ #224
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> 00000000ffffffff ffff88003290f4e8 ffffffff8290d92d 0000000000000000
> ffff8800348c5f00 ffffffff85fc05c0 ffff88003290f528 ffffffff813429e9
> ffffffff81617ffd ffffffff85fc05c0 0000000000000a5b ffff880035c09188
> Call Trace:
> [< inline >] __dump_stack lib/dump_stack.c:15
> [<ffffffff8290d92d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
> [<ffffffff813429e9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:483
> [<ffffffff81342c19>] warn_slowpath_null+0x29/0x30 kernel/panic.c:516
> [<ffffffff81617ffd>] task_ctx_sched_out+0x8d/0xa0 kernel/events/core.c:2651
> [< inline >] perf_event_context_sched_out kernel/events/core.c:2550
> [<ffffffff81622263>] __perf_event_task_sched_out+0x5d3/0x1230
> kernel/events/core.c:2634
> [< inline >] perf_event_task_sched_out include/linux/perf_event.h:948
> [< inline >] prepare_task_switch kernel/sched/core.c:2613
> [< inline >] context_switch kernel/sched/core.c:2771
> [<ffffffff85e7d2d7>] __schedule+0xb27/0x1c50 kernel/sched/core.c:3282
> [<ffffffff85e7eed2>] preempt_schedule_common+0x42/0x70 kernel/sched/core.c:3352
> [<ffffffff85e7ef17>] _cond_resched+0x17/0x20 kernel/sched/core.c:4720
> [< inline >] __wait_for_common kernel/sched/completion.c:90
> [< inline >] wait_for_common kernel/sched/completion.c:101
> [<ffffffff85e8158b>] wait_for_completion+0x8b/0x300
> kernel/sched/completion.c:122
> [<ffffffff8149255e>] __wait_rcu_gp+0x12e/0x1a0 kernel/rcu/update.c:365
> [<ffffffff81498218>] synchronize_sched.part.58+0x88/0xb0 kernel/rcu/tree.c:3220
> [<ffffffff814a19a3>] synchronize_sched+0xa3/0x120 kernel/rcu/tree.c:3217
> [< inline >] synchronize_rcu include/linux/rcupdate.h:320
> [<ffffffff817dd6be>] namespace_unlock+0xee/0x100 fs/namespace.c:1349
> [<ffffffff817e671e>] drop_collected_mounts+0x8e/0xa0 fs/namespace.c:1762
> [<ffffffff817eaf1c>] put_mnt_ns+0x4c/0x70 fs/namespace.c:3149
> [<ffffffff813a7594>] free_nsproxy+0x44/0x1d0 kernel/nsproxy.c:161
> [<ffffffff813a7992>] switch_task_namespaces+0xa2/0xc0 kernel/nsproxy.c:213
> [<ffffffff813a79c7>] exit_task_namespaces+0x17/0x20 kernel/nsproxy.c:218
> [<ffffffff8134bb69>] do_exit+0x8b9/0x2b80 kernel/exit.c:749
> [<ffffffff8134dfa8>] do_group_exit+0x108/0x330 kernel/exit.c:880
> [<ffffffff81371214>] get_signal+0x5e4/0x1500 kernel/signal.c:2307
> [<ffffffff81193db3>] do_signal+0x83/0x1c90 arch/x86/kernel/signal.c:712
> [<ffffffff81006685>] exit_to_usermode_loop+0x1a5/0x210
> arch/x86/entry/common.c:247
> [< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:282
> [<ffffffff8100851a>] syscall_return_slowpath+0x2ba/0x340
> arch/x86/entry/common.c:344
> [<ffffffff85e8dba2>] int_ret_from_sys_call+0x25/0x9f
> arch/x86/entry/entry_64.S:281
> ---[ end trace 93985d046a082b72 ]---


And this:

------------[ cut here ]------------
WARNING: CPU: 3 PID: 24027 at kernel/events/core.c:2744
ctx_sched_in+0x253/0x1390()
Modules linked in:
CPU: 3 PID: 24027 Comm: syz-executor Not tainted 4.4.0+ #224
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
00000000ffffffff ffff8800620079d0 ffffffff8290d92d 0000000000000000
ffff880065652f80 ffffffff85fc05c0 ffff880062007a10 ffffffff813429e9
ffffffff8161bd13 ffffffff85fc05c0 0000000000000ab8 0000000000000003
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff8290d92d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
[<ffffffff813429e9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:483
[<ffffffff81342c19>] warn_slowpath_null+0x29/0x30 kernel/panic.c:516
[<ffffffff8161bd13>] ctx_sched_in+0x253/0x1390 kernel/events/core.c:2744
[<ffffffff8161ce97>] perf_event_sched_in+0x47/0xa0 kernel/events/core.c:2106
[<ffffffff8161cfcc>] ctx_resched+0xdc/0x1d0 kernel/events/core.c:2119
[<ffffffff8161db6b>] __perf_install_in_context+0x12b/0x1b0
kernel/events/core.c:2150
[<ffffffff8160c36f>] remote_function+0x12f/0x1b0 kernel/events/core.c:74
[<ffffffff814eab83>] generic_exec_single+0x253/0x450 kernel/smp.c:156
[<ffffffff814eaf96>] smp_call_function_single+0x216/0x340 kernel/smp.c:300
[<ffffffff81606b7c>] task_function_call+0x11c/0x130 kernel/events/core.c:101
[<ffffffff81611bc8>] perf_install_in_context+0x318/0x640
kernel/events/core.c:2193
[<ffffffff81628a20>] SYSC_perf_event_open+0x1720/0x1fa0
kernel/events/core.c:8457
[<ffffffff8162fad9>] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8153
[<ffffffff85e8da36>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
---[ end trace 3930c39e9993a16d ]---