Re: [tip:perfcounters/core] perf_counter: Optimize context switchbetween identical inherited contexts

From: Paul Mackerras
Date: Mon May 25 2009 - 02:18:55 EST

Ingo Molnar writes:

> * tip-bot for Paul Mackerras <paulus@xxxxxxxxx> wrote:
> > @@ -885,6 +934,16 @@ void perf_counter_task_sched_out(struct task_struct *task, int cpu)
> >
> > regs = task_pt_regs(task);
> > perf_swcounter_event(PERF_COUNT_CONTEXT_SWITCHES, 1, 1, regs, 0);
> > +
> > + next_ctx = next->perf_counter_ctxp;
> > + if (next_ctx && context_equiv(ctx, next_ctx)) {
> > + task->perf_counter_ctxp = next_ctx;
> > + next->perf_counter_ctxp = ctx;
> > + ctx->task = next;
> > + next_ctx->task = task;
> > + return;
> > + }
> there's one complication that this trick is causing - the migration
> counter relies on ctx->task to get per task migration stats:
> static inline u64 get_cpu_migrations(struct perf_counter *counter)
> {
> struct task_struct *curr = counter->ctx->task;
> if (curr)
> return curr->se.nr_migrations;
> return cpu_nr_migrations(smp_processor_id());
> }
> as ctx->task is now jumping (while we keep the context), the
> migration stats are out of whack.

How did you notice this? The overall sum over all children should
still be correct, though some individual children's counters could go
negative, so the result of a read on the counter when some children
have exited and others haven't could look a bit strange. Reading the
counter after all children have exited should be fine, though.

One of the effects of optimizing the context switch is that in
general, reading the value of an inheritable counter when some
children have exited but some are still running might produce results
that include some of the activity of the still-running children and
might not include all of the activity of the children that have
exited. If that's a concern then we need to implement the "sync child
counters" ioctl that has been suggested.

As for the migration counter, it is the only software counter that is
still using the "old" approach, i.e. it doesn't generate interrupts
and it uses the counter->prev_state field (which I hope to
eliminate one day). It's also the only software counter which counts
events that happen while the task is not scheduled in. The cleanest
thing would be to rewrite the migration counter code to have a callin
from the scheduler when migrations happen.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at