Re: perf: bisected sampling bug in Linux 4.11-rc1

From: Vince Weaver
Date: Fri Jul 14 2017 - 16:14:39 EST


On Fri, 14 Jul 2017, Alexander Shishkin wrote:

> Vince Weaver <vincent.weaver@xxxxxxxxx> writes:
>
> > I was tracking down some regressions in my perf_event_test testsuite.
> > Some of the tests broke in the 4.11-rc1 timeframe.
> >
> > I've bisected one of them, this report is about
> > tests/overflow/simul_oneshot_group_overflow
> > This test creates an event group containing two sampling events, set
> > to overflow to a signal handler (which disables and then refreshes the
> > event).
> >
> > On a good kernel you get the following:
> > Event perf::instructions with period 1000000
> > Event perf::instructions with period 2000000
> > fd 3 overflows: 946 (perf::instructions/1000000)
> > fd 4 overflows: 473 (perf::instructions/2000000)
> > Ending counts:
> > Count 0: 946379875
> > Count 1: 946365218
> >
> > With the broken kernels you get:
> > Event perf::instructions with period 1000000
> > Event perf::instructions with period 2000000
> > fd 3 overflows: 938 (perf::instructions/1000000)
> > fd 4 overflows: 318 (perf::instructions/2000000)
> > Ending counts:
> > Count 0: 946373080
> > Count 1: 653373058
>
> I'm not sure I'm seeing it (granted, it's a friday evening): is it the
> difference in overflow counts?

It's two things.
It's created an grouped event, with the two events both
perf::instructions.

1. The total count at the end should be the same for both
(on the failing kernels it is not)
2. The overflow count for both events should be roughly
total_events/sample_freq.
(on the failing kernels it is not)

> Also, are they cpu or task bound?

The open looks like this:
perf_event_open(&pe,0,-1,-1,0);

On the failing case, the group leader is pinned.

The source code for the test is here:
https://github.com/deater/perf_event_tests/blob/master/tests/overflow/simul_oneshot_group_overflow.c

Vince