Re: [patch 1/5] perf: Fixup hrtimer forward wreckage

From: Stephane Eranian
Date: Tue Apr 14 2015 - 18:10:43 EST


On Mon, Apr 13, 2015 at 2:02 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> perf_event_mux_interval_ms_store() tries to apply an updated
> hrtimer_interval to all possible cpus in a completely unsafe way. The
> changelog of the offending commit says:
>
> "In the 5th version, we handle the reprogramming of the hrtimer using
> hrtimer_forward_now(). That way, we sync up to new timer value
> quickly (suggested by Jiri Olsa)."
>
> The hrtimer reprogramming is completely unrelated to
> hrtimer_forward_now(). hrtimer_forward_now() merily forwards the
> expiry time of the hrtimer past now and that's where things get really
> bad in this code:
>
> The forward function is called on enqueued timers. That means the
> update of the expiry time corrupts the ordering of the hrtimer rbtree.
>
> The proper way to update hrtimers on remote cpus is to use a smp
> function call on all online cpus and perform the update locally by
> canceling the timer, forwarding and restarting it.
>
> Fixes: 62b856397927 "perf: Add sysfs entry to adjust multiplexing interval per PMU"
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Stephane Eranian <eranian@xxxxxxxxxx>
> Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>


Thanks for fixing this Thomas!

>
> ---
> kernel/events/core.c | 36 ++++++++++++++++++++----------------
> 1 file changed, 20 insertions(+), 16 deletions(-)
>
> Index: linux/kernel/events/core.c
> ===================================================================
> --- linux.orig/kernel/events/core.c
> +++ linux/kernel/events/core.c
> @@ -6827,13 +6827,27 @@ perf_event_mux_interval_ms_show(struct d
> return snprintf(page, PAGE_SIZE-1, "%d\n", pmu->hrtimer_interval_ms);
> }
>
> +static void perf_event_mux_update_interval(void *info)
> +{
> + struct pmu *pmu = info;
> + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
> + int was_armed = hrtimer_cancel(&cpuctx->hrtimer);
> +
> + /* Update the interval and restart the timer with the new interval */
> + cpuctx->hrtimer_interval = ms_to_ktime(pmu->hrtimer_interval_ms);
> + if (was_armed) {
> + hrtimer_forward_now(&cpuctx->hrtimer, cpuctx->hrtimer_interval);
> + hrtimer_start_expires(&cpuctx->hrtimer, HRTIMER_MODE_ABS);
> + }
> +}
> +
> static ssize_t
> perf_event_mux_interval_ms_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t count)
> {
> struct pmu *pmu = dev_get_drvdata(dev);
> - int timer, cpu, ret;
> + int timer, ret;
>
> ret = kstrtoint(buf, 0, &timer);
> if (ret)
> @@ -6842,22 +6856,12 @@ perf_event_mux_interval_ms_store(struct
> if (timer < 1)
> return -EINVAL;
>
> - /* same value, noting to do */
> - if (timer == pmu->hrtimer_interval_ms)
> - return count;
> -
> - pmu->hrtimer_interval_ms = timer;
> -
> - /* update all cpuctx for this PMU */
> - for_each_possible_cpu(cpu) {
> - struct perf_cpu_context *cpuctx;
> - cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
> - cpuctx->hrtimer_interval = ns_to_ktime(NSEC_PER_MSEC * timer);
> -
> - if (hrtimer_active(&cpuctx->hrtimer))
> - hrtimer_forward_now(&cpuctx->hrtimer, cpuctx->hrtimer_interval);
> + if (timer != pmu->hrtimer_interval_ms) {
> + get_online_cpus();
> + pmu->hrtimer_interval_ms = timer;
> + on_each_cpu(perf_event_mux_update_interval, pmu, 1);
> + put_online_cpus();
> }
> -
> return count;
> }
> static DEVICE_ATTR_RW(perf_event_mux_interval_ms);
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/