Re: [PATCH v5] hrtimer: avoid retrigger_next_event IPI

From: Thomas Gleixner
Date: Sat Apr 17 2021 - 12:51:14 EST


On Sat, Apr 17 2021 at 18:24, Thomas Gleixner wrote:
> On Fri, Apr 16 2021 at 13:13, Peter Xu wrote:
>> On Fri, Apr 16, 2021 at 01:00:23PM -0300, Marcelo Tosatti wrote:
>>>
>>> +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME) | \
>>> + (1U << HRTIMER_BASE_REALTIME_SOFT) | \
>>> + (1U << HRTIMER_BASE_TAI) | \
>>> + (1U << HRTIMER_BASE_TAI_SOFT))
>>> +
>>> +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base)
>>> +{
>>> + if (cpu_base->softirq_activated)
>>> + return true;
>>
>> A pure question on whether this check is needed...
>>
>> Here even if softirq_activated==1 (as softirq is going to happen), as long as
>> (cpu_base->active_bases & CLOCK_SET_BASES)==0, shouldn't it already mean that
>> "yes indeed clock was set, but no need to kick this cpu as no relevant timer"?
>> As that question seems to be orthogonal to whether a softirq is going to
>> trigger on that cpu.
>
> That's correct and it's not any different from firing the IPI because in
> both cases the update happens with the base lock of the CPU in question
> held. And if there are no active timers in any of the affected bases,
> then there is no need to reevaluate the next expiry because the offset
> update does not affect any armed timers. It just makes sure that the
> next enqueu of a timer on such a base will see the the correct offset.
>
> I'll just zap it.

But the whole thing is still wrong in two aspects:

1) BOOTTIME can be one of the affected clocks when sleep time
(suspended time) is injected because that uses the same mechanism.

Sorry for missing that earlier when I asked to remove it, but
that's trivial to fix by adding the BOOTTIME base back.

2) What's worse is that on resume this might break because that
mechanism is also used to enforce the reprogramming of the clock
event devices and there we cannot be selective on clock bases.

I need to dig deeper into that because suspend/resume has changed
a lot over time, so this might be just a historical leftover. But
without proper analysis we might end up with subtle and hard to
debug wreckage.

Thanks,

tglx