Re: [PATCH v6] hrtimer: avoid retrigger_next_event IPI

From: Thomas Gleixner
Date: Mon Apr 19 2021 - 16:54:54 EST


On Mon, Apr 19 2021 at 16:39, Marcelo Tosatti wrote:
>
> +static void clock_was_set_force_reprogram_work(struct work_struct *work)
> +{
> + clock_was_set(true);
> +}
> +
> +static DECLARE_WORK(hrtimer_force_reprogram_work, clock_was_set_force_reprogram_work);
> +
> +
> static void clock_was_set_work(struct work_struct *work)
> {
> - clock_was_set();
> + clock_was_set(false);
> }
>
> static DECLARE_WORK(hrtimer_work, clock_was_set_work);
> @@ -769,9 +777,12 @@ static DECLARE_WORK(hrtimer_work, clock_was_set_work);
> * Called from timekeeping and resume code to reprogram the hrtimer
> * interrupt device on all cpus.
> */
> -void clock_was_set_delayed(void)
> +void clock_was_set_delayed(bool force_reprogram)
> {
> - schedule_work(&hrtimer_work);
> + if (force_reprogram)
> + schedule_work(&hrtimer_force_reprogram_work);
> + else
> + schedule_work(&hrtimer_work);
> }
>
> #else
> @@ -871,6 +882,18 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
> tick_program_event(expires, 1);
> }
>
> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) | \
> + (1U << HRTIMER_BASE_REALTIME_SOFT) | \
> + (1U << HRTIMER_BASE_TAI) | \
> + (1U << HRTIMER_BASE_TAI_SOFT) | \
> + (1U << HRTIMER_BASE_BOOTTIME) | \
> + (1U << HRTIMER_BASE_BOOTTIME_SOFT))
> +
> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
> +{
> + return (cpu_base->active_bases & CLOCK_SET_BASES) != 0;
> +}
> +
> /*
> * Clock realtime was set
> *
> @@ -882,11 +905,42 @@ static void hrtimer_reprogram(struct hrtimer *timer, bool reprogram)
> * resolution timer interrupts. On UP we just disable interrupts and
> * call the high resolution interrupt code.
> */
> -void clock_was_set(void)
> +void clock_was_set(bool force_reprogram)
> {
> #ifdef CONFIG_HIGH_RES_TIMERS
> - /* Retrigger the CPU local events everywhere */
> - on_each_cpu(retrigger_next_event, NULL, 1);
> + cpumask_var_t mask;
> + int cpu;
> +
> + if (force_reprogram == true) {
> + on_each_cpu(retrigger_next_event, NULL, 1);
> + goto set_timerfd;
> + }
> +
> + if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) {
> + on_each_cpu(retrigger_next_event, NULL, 1);
> + goto set_timerfd;
> + }

This is really horrible and the proper thing to do is to seperate the
s2idle/resume related functionality which allows to do some other
cleanups as well.

I already started to look at that over the weekend, but got stuck due to
a couple of correctness issues I found with the whole clock_was_set()
mechanism while looking. That stuff needs to be fixed first.

Back to the above. For the s2idle resume path there is no reason for an
IPI at all. It's just that way because it has been bolted on the
existing machinery.

So today it does:

tick_unfreeze() {
if (first_cpu_to_unfreeze()) {
timekeeping_resume();
tick_resume();
tick_resume_local();
clock_was_set_delayed();
} else {
tick_resume_local();
}
}

Then after everything is unfrozen including workqueues the delayed
clock_was_set() runs and does the IPI dance. That's just silly.

We can be smarter than that:

tick_unfreeze() {
if (first_cpu_to_unfreeze())
timekeeping_resume();
tick_resume();
tick_resume_local();
hrtimer_resume_local();
} else {
tick_resume_local();
hrtimer_resume_local();
}
}

See? No IPI required at all and no magic force argument and as a bonus
we get also understandable code.

Splitting this properly allows to be smarter about the affected clocks
because most of the clock_was_set() events only care about
CLOCK_REALTIME and CLOCK_TAI and just the sleep time injection affects
CLOCK_BOOTTIME as well.

There are a few other things which can be done to avoid even more IPIs,
but let me address the correctness issues of clock_was_set() first.

Thanks,

tglx