Re: [patch] oprofile for ppc

From: Andrew Fleming (afleming@motorola.com)
Date: Tue Mar 11 2003 - 16:54:12 EST


On Monday, Mar 10, 2003, at 20:14 US/Central, Segher Boessenkool wrote:

> Albert Cahalan wrote:
>> On Sun, 2003-03-09 at 22:50, Segher Boessenkool wrote:
>>> Benjamin Herrenschmidt wrote:
>>>> Beware though that some G4s have a nasty bug that
>>>> prevents using the performance counter interrupt
>>>> (and the thermal interrupt as well).
>>>
>>> MPC7400 version 1.2 and lower have this problem.
>> MPC7410 you mean, right? Are those early revisions
>> even popular?
>
> 7400 and 7410 core versions are identical, afaik. I don't
> think any 7410 core lower than version 2.0 was ever used
> in any consumer machines. ymmv.

I've been looking into this, and all versions of the 7410 before 1.3
(where it was fixed) have this errata. And there is no version of the
7410 above 1.4. Some of the machines with 7410s, and all of the
machines with 7400s have this problem, I believe. If nothing else, it
is a security issue if user processes are allowed to configure the
counters (something that would be nice, in terms of useability).

>
>> I'm wondering if the MPC7400 is also affected.
>> The MPC7400 has some significant differences.
>> The pipeline length changed.
>
> Between 7400 and 7410? That's news to me...

There are no significant changes between the 7400 and 7410 pipelines,
the primary difference was the process in which it was fabricated. You
are probably thinking of the 7450 and its successors--the pipeline
changed in that model from 4 to 7 stages (depending on how one defines
"stage").

>
>>>> The problem is that if any of those fall at the same
>>>> time as the DEC interrupt, the CPU messes up it's
>>>> internal state and you lose SRR0/SRR1, which means
>>>> you can't recover from the exception.
>>>
>>> But the worst that happens is that you lose that
>>> process, isn't it? Not all that big a problem,
>>> esp. since the window in which this can happen is
>>> very small.
>> I think you'd get an infinite loop of either
>> the decrementer or performance monitor. That's
>> mostly fixable by checking for the condition and
>> killing the affected process, but that process
>> could be one of the ones built into the kernel.
>
> That would be a problem, yes :-(
>
>> So the use of oprofile comes down to a choice:
>> a. Ignore the problem.
>> rare crashes
>
> As long as its rare, that's not _too_ big of a problem,
> really. Just document it ;)

I suggest a modification of this behavior, which I'll describe at the
end of this email.

>
>> b. The decrementer goes much faster for profiling.
>> high overhead, awkwardness in non-time measurement
>
> Bad idea, I think.
>
>> c. The performance monitor is used for clock ticks.
>> hard choices about sharing or frequency
>
> I'd go for this option.

I don't think either of these are ideal. On most systems the
decrementer is used for generating timer interrupts used for
preemption, and other such fun. Messing around with this facility to
work around errata in the 7400 seems excessive. And locking down one
of the counters to only count cycles is undesireable: you would lose
the ability to count some events in most implementations of the
counters. As time goes on, the number of people wanting to tune
performance on 300-500MHz 7400/7410 processors will dwindle, but the
complications created by this workaround would haunt us forever.

As I see it, the problem is:
1) If the decrementer and perfmon interrupts occur one after the other
while a process is being profiled on some 7400/7410 processors, that
process's state (in terms of where it is in execution) will be lost.

This can be acceptable, since the PMI handler could detect such a
condition (a return address of 0x900 would be a good hint), and
terminate the offending program. Since nothing is harmed, you just try
again. As long as this behavior, and its cause, is documented (it
could even be detected by the module), this should be acceptable to
people with these processors.

2) If the same happens while in the kernel on one of those processors,
we have a kernel panic.

This is not, I think, acceptable behavior. Linux shouldn't crash.
However, this should only be a problem if the counters are on in
privileged space. If they don't increment when an interrupt occurs,
they can't cause a PMI. So the solution would be to disallow profiling
the kernel. However, some people want to profile the kernel, and those
processors should not be left out, if possible. What we can do,
though, is use timer based profiling for the kernel for only those
processors. The processors should be easy to detect. We just need to
make sure not to enable the PMI in the one condition (kernel is being
profiled one 7400/7410 processors before 7410 version 1.3).

Any thoughts on this solution?

Andy Fleming

PowerPC Software Enablement
Motorola, Inc

Note that my opinions are not Motorola's, even the good ones!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Mar 15 2003 - 22:00:27 EST