Re: [PATCH 4/7] perf: Free aux pages in unmap path

From: Alexander Shishkin
Date: Wed Dec 09 2015 - 04:58:25 EST


Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:

> Yuck, nasty problem. Also, I think its broken. By not having
> mmap_mutex around the whole thing, notably rb_free_aux(), you can race
> against mmap().
>
> What seems possible now is that:
>
> mmap(aux); // rb->aux_mmap_count == 1
> munmap(aux)
> atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex); // == 0
>
> mutex_unlock(&event->mmap_mutex);
>
> mmap(aux)
> if (rb_has_aux())
> atomic_inc(&rb->aux_mmap_count); // == 1
>
> rb_free_aux(); // oops!!

Wait, this isn't actually a problem, we can hold mmap_mutex over
rb_free_aux(), as we actually already do in current code. My patch did
it wrongly though, but there's really no reason to drop the mutex before
rb_free_aux().

> So I thought that pulling all the aux bits out from the ring_buffer
> struct, such that we have rb->aux, would solve the issue in that we can
> then fix mmap() to have the same retry loop as for event->rb.
>
> And while that fixes that race (I almost had that patch complete -- I
> might still send it out, just so you can see what it looks like), it
> doesn't solve the complete problem I don't think.

I was toying with that some time ago, but I couldn't really see the
benefits that would justify the hassle.

> Because in that case, you want the event to start again on the new
> buffer, and I think its possible we end up calling ->start() before
> we've issued the ->stop() and that would be BAD (tm).

So if we just hold the mmap_mutex over rb_free_aux(), this won't
happen, right?

> The only solution I've come up with is:
>
> struct rb_aux *aux = rb->aux;
>
> if (aux && vma->vm_pgoff == aux->pgoff) {
> ctx = perf_event_ctx_lock(event);
> if (!atomic_dec_and_mutex_lock(&aux->mmap_count, &event->mmap_mutex) {
> /* we now hold both ctx::mutex and event::mmap_mutex */
> rb->aux = NULL;
> ring_buffer_put(rb); /* aux had a reference */
> _perf_event_stop(event);

Here we really need to ensure that none of the events on the
rb->event_list is running, not just the parent, and that still presents
complications wrt irqsave rb->event_lock even with your new idea for
perf_event_stop().

How about something like this to stop the writers:

static int __ring_buffer_output_stop(void *info)
{
struct ring_buffer *rb = info;
struct perf_event *event;

spin_lock(&rb->event_lock);
list_for_each_entry_rcu(event, &rb->event_list, rb_entry) {
if (event->state != PERF_EVENT_STATE_ACTIVE)
continue;

event->pmu->stop(event, PERF_EF_UPDATE);
}
spin_unlock(&rb->event_lock);

return 0;
}

static void perf_event_output_stop(struct perf_event *event)
{
struct ring_buffer *rb = event->rb;

lockdep_assert_held(&event->mmap_mutex);

if (event->cpu == -1)
perf_event_stop(event);

cpu_function_call(event->cpu, __ring_buffer_output_stop, rb);
}

And then in the mmap_close:

if (rb_has_aux(rb) && vma->vm_pgoff == rb->aux_pgoff &&
atomic_dec_and_mutex_lock(&rb->aux_mmap_count, &event->mmap_mutex)) {
perf_event_output_stop(event);

/* undo the mlock accounting here */

rb_free_aux(rb);
mutex_unlock(&event->mmap_mutex);
}

Regards,
--
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/