Re: Perf hotplug lockup in v4.9-rc8
From: Mark Rutland
Date: Wed Jan 11 2017 - 11:27:14 EST
On Wed, Jan 11, 2017 at 05:03:58PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 11, 2017 at 02:59:20PM +0000, Mark Rutland wrote:
> > On Fri, Dec 09, 2016 at 02:59:00PM +0100, Peter Zijlstra wrote:
> > > + * If we get a false negative, things are complicated. If we are after
> > > + * perf_event_context_sched_in() ctx::lock will serialize us, and the
> > > + * value must be correct. If we're before, it doesn't matter since
> > > + * perf_event_context_sched_in() will program the counter.
> > > + *
> > > + * However, this hinges on the remote context switch having observed
> > > + * our task->perf_event_ctxp[] store, such that it will in fact take
> > > + * ctx::lock in perf_event_context_sched_in().
> >
> > Sorry if I'm being thick here, but which store are we describing above?
> > i.e. which function, how does that relate to perf_install_in_context()?
>
> The only store to perf_event_ctxp[] of interest is the initial one in
> find_get_context().
Ah, I see. I'd missed the rcu_assign_pointer() when looking around for
an assignment.
> > I haven't managed to wrap my head around why this matters. :/
>
> See the scenario from:
>
> https://lkml.kernel.org/r/20161212124228.GE3124@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> Its installing the first event on 't', which concurrently with the
> install gets migrated to a third CPU.
I was completely failing to consider that this was the installation of
the first event; I should have read the existing comment. Things make a
lot more sense now.
> CPU0 CPU1 CPU2
>
> (current == t)
>
> t->perf_event_ctxp[] = ctx;
> smp_mb();
> cpu = task_cpu(t);
>
> switch(t, n);
> migrate(t, 2);
> switch(p, t);
>
> ctx = t->perf_event_ctxp[]; // must not be NULL
>
> smp_function_call(cpu, ..);
>
> generic_exec_single()
> func();
> spin_lock(ctx->lock);
> if (task_curr(t)) // false
>
> add_event_to_ctx();
> spin_unlock(ctx->lock);
>
> perf_event_context_sched_in();
> spin_lock(ctx->lock);
> // sees event
>
>
>
> So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing.
> Because if CPU2's load of that variable were to observe NULL, it would
> not try to schedule the ctx and we'd have a task running without its
> counter, which would be 'bad'.
>
> As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it
> first and not see the event yet, then CPU0 must observe task_running()
> and retry. If the install happens first, then we must see the event on
> sched-in and all is well.
I think I follow now. Thanks for bearing with me!
> In any case, I'll try and write a proper Changelog for this...
If it's just the commit message and/or comments changing, feel free to
add:
Tested-by: Mark Rutland <mark.rutland@xxxxxxx>
Thanks,
Mark.