Re: perf: perf_fuzzer lockup in perf_cgroup_attach

From: Stephane Eranian
Date: Thu Sep 15 2016 - 01:35:58 EST


On Wed, Sep 14, 2016 at 7:43 PM, Vince Weaver <vincent.weaver@xxxxxxxxx> wrote:
>
> so the skylake that was fuzzing finally is mostly locked up.
>
> Really hard to tell what's going, especially as KASLR made looking up the
> addresses a big pain.
>
I would think there is a way to disable KASLR for this kind of testing!

Which of your fuzzer scripts are you using? fast_repro99.sh?

> The best I can tell things are getting wedged somehow in
> perf_cgroup_switch() while interrupts are disabled. Interrupts are never
> getting re-enabled, causing the RCU and NMI watchdogs to trigger (and more
> alarming things like the SATA bus resetting).
>
How do you get to perf_cgroup_switch() from the traces you provide below?

> [26292.413603] Task dump for CPU 4:
> [26292.413604] perf_fuzzer R running task 0 8870 1096 0x10000008
> [26292.413605] ffff9045f29e1100 00000000e7143ab0 ffff9045fdd03db8 ffffffff938accef
> [26292.413606] 0000000000000004 0000000000000087 ffff9045fdd03dd0 ffffffff938af5e9
> [26292.413607] 0000000000000004 ffff9045fdd03e00 ffffffff93984928 ffff9045fdd19440
> [26292.413608] Call Trace:
> [26292.413609] <IRQ> [<ffffffff938accef>] sched_show_task+0xaf/0x110
> [26292.413611] [<ffffffff938af5e9>] dump_cpu_task+0x39/0x40
> [26292.413613] [<ffffffff93984928>] rcu_dump_cpu_stacks+0x80/0xbb
> [26292.413614] [<ffffffff938e78ce>] rcu_check_callbacks+0x71e/0x880
> [26292.413615] [<ffffffff9393364c>] ? acct_account_cputime+0x1c/0x20
> [26292.413616] [<ffffffff938b0049>] ? account_system_time+0x79/0x120
> [26292.413617] [<ffffffff938fe8b0>] ? tick_sched_do_timer+0x30/0x30
> [26292.413619] [<ffffffff938ee88f>] update_process_times+0x2f/0x60
> [26292.413619] [<ffffffff938fe285>] tick_sched_handle.isra.13+0x25/0x60
> [26292.413620] [<ffffffff938fe8ed>] tick_sched_timer+0x3d/0x70
> [26292.413621] [<ffffffff938ef633>] __hrtimer_run_queues+0xf3/0x280
> [26292.413623] [<ffffffff938efb08>] hrtimer_interrupt+0xa8/0x1a0
> [26292.413624] [<ffffffff93852d88>] local_apic_timer_interrupt+0x38/0x60
> [26292.413625] [<ffffffff93e1dee4>] smp_trace_apic_timer_interrupt+0x44/0xde
> [26292.413627] [<ffffffff9396ff60>] ? perf_cgroup_attach+0x70/0x70
> [26292.413628] [<ffffffff93e1d232>] trace_apic_timer_interrupt+0x82/0x90
> [26292.413628] <EOI> [<ffffffff9396ff60>] ? perf_cgroup_attach+0x70/0x70
> [26292.413631] [<ffffffff939039d6>] ? smp_call_function_single+0xd6/0x130
> [26292.413632] [<ffffffff9396ff60>] ? perf_cgroup_attach+0x70/0x70
> [26292.413633] [<ffffffff9396ec53>] cpu_function_call+0x43/0x60
> [26292.413634] [<ffffffff93976840>] ? __perf_event_enable+0x260/0x260
> [26292.413635] [<ffffffff939737e1>] perf_install_in_context+0x141/0x150
> [26292.413636] [<ffffffff9397956e>] SYSC_perf_event_open+0x70e/0xfe0
> [26292.413637] [<ffffffff938a99e4>] ? check_preempt_curr+0x54/0x90
> [26292.413639] [<ffffffff9397c099>] SyS_perf_event_open+0x9/0x10
> [26292.413640] [<ffffffff93803bb4>] do_syscall_64+0x64/0x160
> [26292.413641] [<ffffffff93e1b625>] entry_SYSCALL64_slow_path+0x25/0x25
>
> [26316.489382] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [perf_fuzzer:8870]
> [26316.497309] CPU: 4 PID: 8870 Comm: perf_fuzzer Tainted: G W 4.8.0-rc6+ #5
> [26316.497310] Hardware name: LENOVO 10FY0017US/SKYBAY, BIOS FWKT53A 06/06/2016
> [26316.497310] task: ffff9045f29e1100 task.stack: ffff9045f0a48000
> [26316.497311] RIP: 0010:[<ffffffff939039d6>] [<ffffffff939039d6>] smp_call_function_single+0xd6/0x130
> [26316.497312] RSP: 0018:ffff9045f0a4bd30 EFLAGS: 00000202
> [26316.497313] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
> [26316.497313] RDX: 0000000000000001 RSI: 00000000000000fb RDI: 0000000000000286
> [26316.497314] RBP: ffff9045f0a4bd78 R08: ffff9045fdc9bed0 R09: ffff9045f2ac8000
> [26316.497314] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff9396ff60
> [26316.497315] R13: ffff9045fdc9bed0 R14: 000000000000001e R15: ffff9045fdc9bee0
> [26316.497315] FS: 00007f4157a08700(0000) GS:ffff9045fdd00000(0000) knlGS:0000000000000000
> [26316.497316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [26316.497316] CR2: 000000000693a048 CR3: 0000000231049000 CR4: 00000000003407e0
> [26316.497317] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [26316.497317] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> [26316.497318] Stack:
> [26316.497318] 0000000000000000 ffff9045ee35da80 0000000000000000 ffffffff9396ff60
> [26316.497319] ffff9045f0a4bd88 0000000000000003 00000000e7143ab0 ffff9045ee47613a
> [26316.497321] ffff9045ee476000 ffff9045f0a4bdb0 ffffffff9396ec53 0000000000000000
> [26316.497322] Call Trace:
> [26316.497323] [<ffffffff9396ff60>] ? perf_cgroup_attach+0x70/0x70
> [26316.497324] [<ffffffff9396ec53>] cpu_function_call+0x43/0x60
> [26316.497325] [<ffffffff93976840>] ? __perf_event_enable+0x260/0x260
> [26316.497326] [<ffffffff939737e1>] perf_install_in_context+0x141/0x150
> [26316.497327] [<ffffffff9397956e>] SYSC_perf_event_open+0x70e/0xfe0
> [26316.497328] [<ffffffff938a99e4>] ? check_preempt_curr+0x54/0x90
> [26316.497330] [<ffffffff9397c099>] SyS_perf_event_open+0x9/0x10
> [26316.497331] [<ffffffff93803bb4>] do_syscall_64+0x64/0x160
> [26316.497332] [<ffffffff93e1b625>] entry_SYSCALL64_slow_path+0x25/0x25
> [26316.497332] Code: 25 28 00 00 00 75 70 48 83 c4 38 5b 41 5c 5d c3 48 8d 75 c8 48 89 d1 89 df 4c 89 e2 e8 14 fe ff ff 8b 55 e0 83 e2 01 74 0a f3 90 <8b> 55 e0 83 e2 01 75 f6 eb c3 8b 05 ba e5 dd 00 85 c0 75 85 80
>