[PATCH] perf/core: Fix endless multiplex timer

From: kan . liang
Date: Tue Mar 03 2020 - 15:29:22 EST


From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

A lot of time are spent in writing uncore MSRs even though no perf is
running.

4.66% swapper [kernel.kallsyms] [k] native_write_msr
|
--4.56%--native_write_msr
|
|--1.68%--snbep_uncore_msr_enable_box
| perf_mux_hrtimer_handler
| __hrtimer_run_queues
| hrtimer_interrupt
| smp_apic_timer_interrupt
| apic_timer_interrupt
| cpuidle_enter_state
| cpuidle_enter
| do_idle
| cpu_startup_entry
| start_kernel
| secondary_startup_64

The root cause is that multiplex timer was not stopped when perf stat
finished.
Current perf relies on rotate_necessary to determine whether the
multiplex timer should be stopped. The variable only be reset in
ctx_sched_out(), which is not enough for system-wide event.
Perf stat invokes PERF_EVENT_IOC_DISABLE to stop system-wide event
before closing it.
perf_ioctl()
perf_event_disable()
event_sched_out()
The rotate_necessary will never be reset.

The issue is a generic issue, not just impact the uncore.

Check whether we had been multiplexing. If yes, reset rotate_necessary
for the last active event in __perf_event_disable().

Fixes: fd7d55172d1e ("perf/cgroups: Don't rotate events for cgroups unnecessarily")
Reported-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Reviewed-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Signed-off-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
---
kernel/events/core.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3f1f77de7247..50688de56181 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2242,6 +2242,16 @@ static void __perf_event_disable(struct perf_event *event,
update_cgrp_time_from_event(event);
}

+ /*
+ * If we had been multiplexing,
+ * stop the rotations for the last active event.
+ * Only need to check system wide events.
+ * For task events, it will be checked in ctx_sched_out().
+ */
+ if ((cpuctx->ctx.nr_events != cpuctx->ctx.nr_active) &&
+ (cpuctx->ctx.nr_active == 1))
+ cpuctx->ctx.rotate_necessary = 0;
+
if (event == event->group_leader)
group_sched_out(event, cpuctx, ctx);
else
--
2.17.1