Re: [tip:perfcounters/core] perf_counter: Optimize context switchbetween identical inherited contexts

From: Ingo Molnar
Date: Mon May 25 2009 - 02:54:46 EST



* Paul Mackerras <paulus@xxxxxxxxx> wrote:

> Ingo Molnar writes:
>
> > * tip-bot for Paul Mackerras <paulus@xxxxxxxxx> wrote:
> >
> > > @@ -885,6 +934,16 @@ void perf_counter_task_sched_out(struct task_struct *task, int cpu)
> > >
> > > regs = task_pt_regs(task);
> > > perf_swcounter_event(PERF_COUNT_CONTEXT_SWITCHES, 1, 1, regs, 0);
> > > +
> > > + next_ctx = next->perf_counter_ctxp;
> > > + if (next_ctx && context_equiv(ctx, next_ctx)) {
> > > + task->perf_counter_ctxp = next_ctx;
> > > + next->perf_counter_ctxp = ctx;
> > > + ctx->task = next;
> > > + next_ctx->task = task;
> > > + return;
> > > + }
> >
> > there's one complication that this trick is causing - the migration
> > counter relies on ctx->task to get per task migration stats:
> >
> > static inline u64 get_cpu_migrations(struct perf_counter *counter)
> > {
> > struct task_struct *curr = counter->ctx->task;
> >
> > if (curr)
> > return curr->se.nr_migrations;
> > return cpu_nr_migrations(smp_processor_id());
> > }
> >
> > as ctx->task is now jumping (while we keep the context), the
> > migration stats are out of whack.
>
> How did you notice this? The overall sum over all children should
> still be correct, though some individual children's counters could
> go negative, so the result of a read on the counter when some
> children have exited and others haven't could look a bit strange.
> Reading the counter after all children have exited should be fine,
> though.

i've noticed a few weirdnesses and then added a debug check and
noticed the negative delta values.

> One of the effects of optimizing the context switch is that in
> general, reading the value of an inheritable counter when some
> children have exited but some are still running might produce
> results that include some of the activity of the still-running
> children and might not include all of the activity of the children
> that have exited. If that's a concern then we need to implement
> the "sync child counters" ioctl that has been suggested.
>
> As for the migration counter, it is the only software counter that
> is still using the "old" approach, i.e. it doesn't generate
> interrupts and it uses the counter->prev_state field (which I hope
> to eliminate one day). It's also the only software counter which
> counts events that happen while the task is not scheduled in. The
> cleanest thing would be to rewrite the migration counter code to
> have a callin from the scheduler when migrations happen.

I'll check with the debug check removed again. If the end result is
OK then i dont think we need to worry much about this, at this
stage.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/