Re: [PATCH] perfcounters: Make s/w counters in a group only countwhen group is on

From: Peter Zijlstra
Date: Mon Mar 16 2009 - 05:58:23 EST


On Sat, 2009-03-14 at 09:41 +1100, Paul Mackerras wrote:
> Peter Zijlstra writes:
>
> > The issue I have with your approach is two-fold:
> > - it breaks the symmetry between software and hardware counters by
> > treating them differently.
>
> So... I was about to restore that symmetry by implementing lazy PMU
> context switching. In the case where we have inherited counters, and
> we are switching from one task to another that both have the same set
> of inherited counters, we don't really need to do anything, because it
> doesn't matter which set of counters the events get added into,
> because they all get added together at the end anyway.

That is only true for actual counting counters, not the sampling kind.

> That is another situation where you can have counters that are active
> when their associated task is not scheduled in, this time for hardware
> counters as well as software counters. So this is not just some weird
> special case for software counters, but is actually going to be more
> generally useful.
>
> > - it doesn't make much conceptual sense to me
>
> It seems quite reasonable to me that things could happen that are
> attributable to a task, but which happen when the task isn't running.
> Not just context switches and migrations - there's a whole class of
> things that the system does on behalf of a process that can happen
> asynchronously. I wouldn't want to say that those kind of things can
> never be counted with software counters.

I've been thinking too much about sampling I think. It makes absolutely
no sense in that light to have events that occur when the task isn't
running, quite simply because its impossible to relate it to whatever
the task is doing at that moment.

However for simple counting events it might make sense to have something
like that.

Still HW counters can simply never do anything like that, and the lazy
PMU thing you propose, while cool for simple stuff like perfstat, is
something all-together different -- it doesn't keep counters enabled
while their task is gone from the cpu, it avoids a counter update
between related tasks.

> > For the context switch counter, we could count the event right before we
> > schedule out, which would make it behave like expected.
> >
> > The same for task migration, most migrations happen when they are in
> > fact running, so there too we can account the migration either before we
> > rip it off the src cpu, or after we place it on the dst cpu.
> >
> > There are a few places where this isn't quite so, like affine wakeups,
> > but there we can account after the placement.
>
> Right - but how do you know whether to do that accounting or not? At
> the moment there simply isn't enough state information in the counter
> to tell you whether or not you should be adding in those things that
> happened while the task wasn't running. At the moment you can't tell
> whether a counter is inactive merely because its task is scheduled
> out, or because it's in a group that won't currently fit on the PMU.

Well, for things like the migration count its easy, we already always
count those, so fudging the perf counter interface isn't too hard. Other
things, yeah, that'll be a tad tricky.

> By the way, I notice that x86 will do the wrong thing if you have a
> group where the leader is an interrupting hardware counter with
> record_type == PERF_RECORD_GROUP and there is a software counter in
> the group, because perf_handle_group calls x86_perf_counter_update on
> each group member unconditionally, and x86_perf_counter_update assumes
> its argument is a hardware counter.

Ah, right, I fixed that for the generic swcounter stuff but then didn't
do the x86 part.. d'oh.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/