Re: [PATCH] perf: Fix sibling iteration
From: Mark Rutland
Date: Fri Mar 16 2018 - 08:07:24 EST
On Fri, Mar 16, 2018 at 11:50:17AM +0100, Peter Zijlstra wrote:
> On Fri, Mar 16, 2018 at 11:39:46AM +0100, Jiri Olsa wrote:
> > On Fri, Mar 16, 2018 at 11:31:29AM +0100, Peter Zijlstra wrote:
> > > There is at least one more known issue with that patch, but neither Mark
> > > nor me could reproduce so far, so we don't know if we're right about the
> > > cause.
> >
> > is there more info about that issue? I could try it
>
> Find below, 0day report didn't go out to lkml. We think moving the
> list_del_init() out from the !RB_NODE_EMPTY() test might fix, but since
> we can't repro so far, its all guesses.
I've managed to reproduce this using the 0day scripts. From the 0day
logs, it looks like it's possible to hit it ~6% of the time.
I added a WARN_ON(RB_NODE_EMPTY(...)), and I see that fire in the exit
path:
[ 76.287197] perf_remove_from_context+0x9a/0xc0
[ 76.287552] perf_event_release_kernel+0x214/0x3e0
[ 76.287928] ? _raw_spin_unlock+0x8/0x10
[ 76.288237] ? locks_remove_file+0x219/0x230
[ 76.288572] perf_release+0xe/0x20
[ 76.288842] __fput+0x1c9/0x340
[ 76.289089] ____fput+0x8/0x10
[ 76.289332] task_work_run+0x9a/0xd0
[ 76.289613] do_exit+0x6cc/0x1220
[ 76.289877] ? __might_sleep+0xcb/0x150
[ 76.290183] do_group_exit+0x96/0x110
[ 76.290473] get_signal+0x8c3/0xb60
[ 76.290750] ? __perf_event_task_sched_in+0x20d/0x250
[ 76.291143] do_signal+0x19/0x950
[ 76.291408] ? finish_task_switch+0x212/0x480
[ 76.291750] ? __switch_to+0x414/0x730
[ 76.292051] ? _raw_spin_unlock_irqrestore+0x45/0x60
[ 76.292439] ? trace_hardirqs_on+0x63/0x100
[ 76.292769] ? prepare_to_wait_event+0x23a/0x250
[ 76.293132] ? do_int80_syscall_32+0x271/0x290
[ 76.293478] exit_to_usermode_loop+0xb9/0x190
[ 76.293819] do_int80_syscall_32+0x271/0x290
[ 76.294161] entry_INT80_32+0x36/0x36
... then we subsequently hit the initial splat, which looks promisingly
like our initial theory.
However, I don't currently understand how we can see a group leader with
an empty RB node in this path. I can only see that being the case for
siblings that we promoted to being leaders, and those should have an
empty list at the point we promote them...
Thanks,
Mark.