Re: [PATCH 2/2] perf_counter: optimize context switch betweenidentical inherited contexts

From: Peter Zijlstra
Date: Mon May 25 2009 - 07:27:20 EST


On Mon, 2009-05-25 at 21:06 +1000, Paul Mackerras wrote:
> Peter Zijlstra writes:
>
> > I'm currently staring at something like the below, trying to find races
> > etc.. ;-)
> [snip]
> > next_ctx = next->perf_counter_ctxp;
> > if (next_ctx && context_equiv(ctx, next_ctx)) {
> > + ctx->task = NULL;
> > + next_ctx->task = NULL;
>
> Trouble is, ctx->task == NULL is used as an indication that this is a
> per-cpu context in various places.

Yeah, already realized that, its enough to simply put them to the new
task, before flipping the context pointers.

> Also, in find_get_context, we have a lifetime problem because *ctx
> could get swapped and then freed underneath us immediately after we
> read task->perf_counter_ctxp. So we need a lock in the task_struct
> that stops sched_out from swapping the context underneath us. That
> led me to the patch below, which I'm about to test... :) It does the
> unclone in find_get_context; we don't actually need it on remove
> because we have no way to remove an inherited counter from a task
> without the task exiting.

Right, I was trying to solve the lifetime issues with RCU and refcount
tricks and the races with ordering, instead of adding another lock.

> @@ -932,12 +927,32 @@ void perf_counter_task_sched_out(struct task_struct *task,
> regs = task_pt_regs(task);
> perf_swcounter_event(PERF_COUNT_CONTEXT_SWITCHES, 1, 1, regs, 0);
>
> + /*
> + * Note: task->perf_counter_ctxp and next->perf_counter_ctxp
> + * can't change underneath us here if we see them non-NULL,
> + * because this is the only place where we change which
> + * context a task points to, and the scheduler will ensure
> + * that this code isn't being called for either of these tasks
> + * on any other cpu at the same time.
> + */
> next_ctx = next->perf_counter_ctxp;
> if (next_ctx && context_equiv(ctx, next_ctx)) {
> + /*
> + * Lock the contexts of both the old and new tasks so that we
> + * don't change things underneath find_get_context etc.
> + * We don't need to be careful about the order in which we
> + * lock the tasks because no other cpu could be trying to lock
> + * both of these tasks -- this is the only place where we lock
> + * two tasks.
> + */
> + spin_lock(&task->perf_counter_ctx_lock);
> + spin_lock(&next->perf_counter_ctx_lock);
> task->perf_counter_ctxp = next_ctx;
> next->perf_counter_ctxp = ctx;
> ctx->task = next;
> next_ctx->task = task;
> + spin_unlock(&next->perf_counter_ctx_lock);
> + spin_unlock(&task->perf_counter_ctx_lock);
> return;
> }

FWIW that nested lock will make lockdep complain -- it can't deadlock
since we're under rq->lock and the tasks can't be stolen from the rq in
that case. So you can silence lockdep by using spin_lock_nested_lock()

> @@ -1067,6 +1082,7 @@ static void perf_counter_cpu_sched_in(struct perf_cpu_context *cpuctx, int cpu)
> __perf_counter_sched_in(ctx, cpuctx, cpu);
> }
>
> +// XXX doesn't do inherited counters too?
> int perf_counter_task_enable(void)
> {
> struct perf_counter *counter;

Good point,.. perf_counter_for_each_child(counter, perf_counter_disable)
should fix that I think.

> @@ -1360,7 +1389,7 @@ static int perf_release(struct inode *inode, struct file *file)
> put_task_struct(counter->owner);
>
> free_counter(counter);
> - put_context(ctx);
> + put_context(ctx); // XXX use after free?
>
> return 0;
> }

don't htink so, but will have a look.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/