Re: [PATCH v1] perf stat: avoid 10ms limit for printing event counts

From: Alexey Budankov
Date: Tue Mar 27 2018 - 12:27:20 EST


On 27.03.2018 14:59, Andi Kleen wrote:
> On Tue, Mar 27, 2018 at 02:40:29PM +0300, Alexey Budankov wrote:
>> On 27.03.2018 12:06, Andi Kleen wrote:
>>>> When running perf stat -I for monitoring e.g. PCIe uncore counters and
>>>> at the same time profiling some I/O workload by perf record e.g. for
>>>> cpu-cycles and context switches, it is then possible to build and
>>>> observe good-enough consolidated CPU/OS/IO(Uncore) performance picture
>>>> for that workload.
>>>
>>> At some point I still hope we can make uncore measurements in
>>> perf record work. Kan tried at some point to allow multiple
>>> PMUs in a group, but was not successfull. But perhaps we
>>> can sample them from a software event instead.
>>>
>>>>
>>>> The warning on possible runtime overhead is still preserved, however
>>>> it is only visible when specifying -v option.
>>>
>>> I would print it unconditionally. Very few people use -v.

Thought it thru more. Printing the warning doesn't make sense in case
you have output to the console because you quickly get your screen
scrolled down. If the interval is small you may even skip it at all
regardless of -v option.

It turns out that the right place to say about possible overhead is
in the help message generated by perf stat -h.

Thanks,
Alexey

>>>
>>> BTW better of course would be to occasionally measure the perf stat
>>> cpu time and only print the warning if it's above some percentage
>>> of a CPU. But that would be much more work.
>>
>> Would you please elaborate more on that?
>
> getrusage() can give you the system+user time of the current process.
> If you compare that to wall time you know the percentage.
>
> Could measure those occasionally (not every interval, but perhaps
> once per second or so). If the overhead reaches a reasonable percentage (5%
> perhaps?) print the warning once.
>
> One problem is th the measurement doesn't inlude time in the remote
> IPIs for reading performance counters on other CPUs. So if the system
> is very large it may be less and less accurate. But maybe it's a good
> enough proxy.
>
> Or in theory could fix the kernel to charge this somehow to the process
> that triggered the IPIs, but that would be another project.
>
> Another problem is that it doesn't account for burstiness. Maybe
> the problem is not the smoothed average of CPU time, but bursts
> competing with the original workload. There's probably no easy
> solution for that.
>
> Also if the CPU perf stat runs on is idle it of course doesn't matter.
> Getting that would require reading /proc, which would be much more
> expensive so probably not a good idea. As a proxy you could check
> the involuntary context switches (also reported by getrusage),
> and if they don't cross some threshold then don't warn)
>
> -Andi
>