Re: Q: perf_install_in_context/perf_event_enable are racy?

From: Oleg Nesterov
Date: Thu Jan 27 2011 - 17:27:00 EST

On 01/27, Peter Zijlstra wrote:
> On Thu, 2011-01-27 at 17:57 +0100, Oleg Nesterov wrote:
> >
> > > With, however, things are more interesting. 2 seems to be adequately
> > > covered by the patch I just send, the IPI will bail and the next
> > > sched-in of the relevant task will pick matters up. 1 otoh doesn't seem
> > > covered, the IPI will bail, leaving us stranded.
> >
> > Hmm, yes... Strangely, I missed that when I was thinking about in_ctxsw.
> >
> > Perhaps, we can change task_oncpu_function_call() so that it returns
> > -EAGAIN in case it hits in_ctxsw != 0? If the caller sees -EAGAIN, it
> > should always retry even if !ctx->is_active.
> That would be very easy to do, we can pass a return value through the
> task_function_call structure.


Perhaps task_oncpu_function_call() should retry itself to simplify the
callers. I wonder if we should also retry if rq->curr != p...

Oh. You know, I am starting to think I will never understand this.
Forget about the problems with __ARCH_WANT_INTERRUPT_ON_CTXSW.

perf_install_in_context() does task_oncpu_function_call() and then

// ctx->is_active == 0

* The lock prevents that this context is scheduled in so we
* can add the event safely, if it the call above did not
* succeed.
if (list_empty(&event->group_entry))
add_event_to_ctx(event, ctx);

This assumes that the task is not running.

Why? Because (I guess) we assume that either task_oncpu_function_call()
should see task_curr() == T, or if the task becomes running after that
it should see the new ->perf_event_ctxp[ctxn] != NULL. And I do not see
how we can prove this.

If find_get_context() sets the new context, the only guarantee we have
is: perf_event_exit_task() can't miss this context. The task, however,
can be scheduled in and miss the new value in perf_event_ctxp[].
And, task_oncpu_function_call() can equally miss rq->curr == task.

But. I think this all falls into the absolutely theoretical category,
and in the worst case nothing really bad can happen, just event_sched_in()
will be delayed until this task reshedules.

So, I think your patch should fix all problems with schedule. Just it
needs the couple of changes we already discussed:

- finish_task_switch() should clear rq->in_ctxsw before

- task_oncpu_function_call() (or its callers) should always
retry the "if (task_curr(p))" code if ->in_ctxsw is true.

If you think we have other problems here please don't tell me,
I already got lost ;)


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at