Re: Possible race between CPU hotplug and perf_pmu_migrate_context

From: Mark Rutland
Date: Fri Sep 05 2014 - 13:00:52 EST


On Fri, Sep 05, 2014 at 04:41:43PM +0100, Linus Torvalds wrote:
> On Fri, Sep 5, 2014 at 8:16 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > How horrible is the below patch (performance wise). It does pretty much
> > the same thing except that percpu_rw_semaphore is a lot saner, its
> > read side performance should be minimal in the absence of writes.
>
> Ugh. Why do any locking at all (whether a new 'perf_rwsem' or using
> 'get_online_cpus()').
>
> Wouldn't it be much nicer to just do what memory management routines
> are *supposed* to do, and get a reference count to the context while
> having a pointer to it?
>
> IOW, why doesn't put_event() just have a
>
> get_ctx(ctx);
> ..
> put_ctx(ctx);
>
> around its use of the context pointer? So if the context ends up being
> migrated during this time, it doesn't get freed.

For the duration of put_event, the event holds a ref on the context. That only
gets decremented _after_ we're done dealing with event->ctx, at the very end of
put_event. Follow the callchain:

put_event(event)
-> _free_event(event)
-> __free_event(event)
-> put_ctx(event->ctx).

As you point out below, the race on event->ctx is the fundamental issue. That
is what results in decrementing the refcount twice (once on a stale event->ctx
pointer).

> However, the more fundamental question is "what protects accesses to
> 'events->ctx'". Why is "put_event()" so special that *it* gets locking
> for the reading of "event->ctx", but none of the other cases of
> reading the ctx pointer gets it or needs it?

The key point is that it doesn't, which is precisely what this patch attempted
to correct.

Regardless you're right that other uses of event->ctx are just as broken. What
perf_pmu_migrate_context failed to take into account was that it is possible to
access an event without going via its owning context and holding ctx->mutex.

> I'm getting the feeling that this race is bigger than just put_event().

We certainly have at least one more race; for event groups perf_read can lock
the stale context.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/