Re: perf,arm -- oops in validate_event

From: Mark Rutland
Date: Tue Aug 06 2013 - 09:08:41 EST


On Tue, Aug 06, 2013 at 12:59:21PM +0100, Will Deacon wrote:
> On Tue, Aug 06, 2013 at 12:19:32PM +0100, Mark Rutland wrote:
> > On Mon, Aug 05, 2013 at 10:17:37PM +0100, Vince Weaver wrote:
> > > It looks like in validate_event() we do
> > >
> > > struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
> > > ...
> > > return armpmu->get_event_idx(hw_events, event) >= 0;
> > >
> > > armpmu is read into r3, and somehow the value at the offset of
> > > armpmu->get_event_idx is either -1 or 0, so when it does a "blx"
> > > branch to the address at this offset we get the ooops.
> > >
> > > c001bf8c: e3120010 tst r2, #16
> > > c001bf90: 0a000004 beq c001bfa8 <validate_event+0x48>
> > > c001bf94: e5933070 ldr r3, [r3, #112] ; 0x70
> > > * c001bf98: e12fff33 blx r3
> > > c001bf9c: e1e00000 mvn r0, r0
> > >
> > > I'm having trouble tracing the code back past that, and I don't have time
> > > to start adding printk's and recompiling right now.
> > >
> > > Vince
> >
> > I think I can save you the effort :)
> >
> > From the looks of the test case and the kernel code in question, it
> > looks like the following happens:
> >
> > * We create a software event, which becomes its own group leader.
> > * We create a hardware event, with the software event as its group
> > leader.
> > * When we try to schedule the hardware event, we try to validate all
> > events in its event group (the leader + siblings), but in doing so we
> > treat the software event as a hardware event, and erroneously try to
> > get its (non-existent) arm_pmu container, and call some garbage value
> > as get_event_idx(...).
> >
> > This could also happen if we tried to add events from different hardware
> > PMUs to the same groups. I'm not sure if that's valid, but I couldn't
> > see any code preventing that, and it seems the x86 validation logic is
> > wired to allow this. If it's not valid, we could skip validation of
> > software events by checking with is_software_event.
>
> But we already check `event->pmu != leader_pmu' in validate_event, so we
> shouldn't get anywhere nearer calling get_event_idx in the case you
> describe. It sounds more like we have an inconsistency with one of the
> events.

Note in my example that the software event was the group leader (so in
fact we'd *only* be checking those events which we can't actually
handle...).

I was also under the impression that in the case of mixed hardware and
software events, a hardware event must be the group leader. That
doesn't seem to be the case. If a hardware event is added to a software
group, the group is moved to hardware context but the original software
event stays as the group leader.

Thanks,
Mark.

>
> Can you dump the events as they're processed in validate_group please?

Sure. Patch and output below. I only get one output line before it
explodes.

Thanks,
Mark.

---->8----

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index d9f5cd4..cdff367 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -253,6 +253,11 @@ validate_event(struct pmu_hw_events *hw_events,
struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
struct pmu *leader_pmu = event->group_leader->pmu;

+ printk("Event %p, PMU %p %s, leader PMU %p %s %s\n",
+ event, event->pmu, event->pmu->name,
+ leader_pmu, leader_pmu->name,
+ is_software_event(event) ? "Software" : "Hardware");
+
if (event->pmu != leader_pmu || event->state < PERF_EVENT_STATE_OFF)
return 1;

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f86599e..796f82b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5668,7 +5668,7 @@ static struct pmu perf_swevent = {
.start = perf_swevent_start,
.stop = perf_swevent_stop,
.read = perf_swevent_read,
-
+ .name = "perf_swevent",
.event_idx = perf_swevent_event_idx,
};

@@ -5788,6 +5788,7 @@ static struct pmu perf_tracepoint = {
.stop = perf_swevent_stop,
.read = perf_swevent_read,

+ .name = "perf_tracepoint",
.event_idx = perf_swevent_event_idx,
};

@@ -6014,7 +6015,7 @@ static struct pmu perf_cpu_clock = {
.start = cpu_clock_event_start,
.stop = cpu_clock_event_stop,
.read = cpu_clock_event_read,
-
+ .name = "perf_cpu_clock",
.event_idx = perf_swevent_event_idx,
};

@@ -6094,7 +6095,7 @@ static struct pmu perf_task_clock = {
.start = task_clock_event_start,
.stop = task_clock_event_stop,
.read = task_clock_event_read,
-
+ .name = "perf_task_clock",
.event_idx = perf_swevent_event_idx,
};

---->8----

Event 87210800, PMU 804d440c perf_task_clock, leader PMU 804d440c perf_task_clock Software
Unable to handle kernel NULL pointer dereference at virtual address 00000f58
pgd = 87380000
[00000f58] *pgd=672f9831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1235 Comm: a.out Not tainted 3.11.0-rc4+ #154
task: 87a0f840 ti: 866b6000 task.ti: 866b6000
PC is at 0x80000000
LR is at validate_event+0x98/0xa8
pc : [<80000000>] lr : [<80016ac8>] psr: 20000013
sp : 866b7e08 ip : 00000000 fp : 866b7f20
r10: 87a0f840 r9 : 00000001 r8 : 866b7e3c
r7 : 80417588 r6 : 804d440c r5 : 804d440c r4 : 87210800
r3 : 80000000 r2 : 80612974 r1 : 87210800 r0 : 866b7e3c
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c53c7d Table: 6738004a DAC: 00000015
Process a.out (pid: 1235, stack limit = 0x866b6238)
Stack: (0x866b7e08 to 0x866b8000)
7e00: 804d440c 80417588 80410e30 87210400 87a5859c 87210400
7e20: 87a58500 87210800 000000d3 87a585a0 00000000 80016cc8 00000000 867501b8
7e40: 866b7e38 87380000 87a58500 87210400 804d42d4 00000000 87210400 800856d0
7e60: 87210800 87a58500 00000000 00000001 00000000 00000002 00000000 800859d4
7e80: 00000000 00000000 00000000 00000000 00000029 00000800 00000000 87a0f840
7ea0: 87210800 00000000 00000000 00000000 866b6000 00000000 8790d9c0 80086754
7ec0: 00000000 00000000 00000000 00000004 00000004 00000000 00000000 00000000
7ee0: 00000000 00000000 00000000 00000000 00000000 00000000 0009104c 866b7fb0
7f00: 00000000 76f3b000 00000000 80008468 8742d388 87ae0000 00000001 00000000
7f20: 00000004 00000050 8dfff7d3 00000000 00000000 00000000 00000000 00000000
7f40: 00000000 00000000 001d4a0b 00000000 00000000 00000000 00000000 00000000
7f60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
7f80: 866b6000 00000000 00000003 00000000 0000016c 8000e348 866b6000 00000000
7fa0: 00000000 8000e1a0 00000000 00000003 00093040 00000000 00000000 00000003
7fc0: 00000000 00000003 00000000 0000016c 00000000 00000000 76f3b000 00000000
7fe0: 7eb41740 7eb41730 00008451 76ec1ed0 40000010 00093040 e4836563 8503c5f2
[<80016ac8>] (validate_event+0x98/0xa8) from [<80016cc8>] (armpmu_event_init+0x1b8/0x27c)
[<80016cc8>] (armpmu_event_init+0x1b8/0x27c) from [<800856d0>] (perf_init_event+0xc8/0x104)
[<800856d0>] (perf_init_event+0xc8/0x104) from [<800859d4>] (perf_event_alloc+0x2c8/0x478)
[<800859d4>] (perf_event_alloc+0x2c8/0x478) from [<80086754>] (SyS_perf_event_open+0x86c/0x9d0)
[<80086754>] (SyS_perf_event_open+0x86c/0x9d0) from [<8000e1a0>] (ret_fast_syscall+0x0/0x30)
Code: bad PC value
---[ end trace 85dac5c0d80aac6d ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/