Re: [PATCH] Clocksource: Avoid misjudgment of clocksource

From: brookxu
Date: Fri Oct 15 2021 - 03:24:21 EST


Hello

>>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>>> index b8a14d2..87f3b67 100644
>>> --- a/kernel/time/clocksource.c
>>> +++ b/kernel/time/clocksource.c
>>> @@ -119,6 +119,7 @@
>>>   static DECLARE_WORK(watchdog_work, clocksource_watchdog_work);
>>>   static DEFINE_SPINLOCK(watchdog_lock);
>>>   static int watchdog_running;
>>> +static unsigned long watchdog_start_time;
>>>   static atomic_t watchdog_reset_pending;
>>>   static inline void clocksource_watchdog_lock(unsigned long *flags)
>>> @@ -356,6 +357,7 @@ static void clocksource_watchdog(struct timer_list *unused)
>>>       int next_cpu, reset_pending;
>>>       int64_t wd_nsec, cs_nsec;
>>>       struct clocksource *cs;
>>> +    unsigned long max_jiffies;
>>>       u32 md;
>>>       spin_lock(&watchdog_lock);
>>> @@ -402,6 +404,10 @@ static void clocksource_watchdog(struct timer_list *unused)
>>>           if (atomic_read(&watchdog_reset_pending))
>>>               continue;
>>> +        max_jiffies = nsecs_to_jiffies(cs->max_idle_ns);
>>> +        if (time_is_before_jiffies(watchdog_start_time + max_jiffies))
>>> +            continue;
>>> +
>
> Hi John:
> What do you think of this suggest?If the interval between two executions of the function clocksource_watchdog() exceeds max_idle_ns. We think the current judement is unreasonable,so we skip this judgment.

I feel that there are still some things that need to be discussed. This
solution may still fail in some scenarios. Assume that tick_do_timer_cpu
is CPU1 and clocksource watchdog is CPU2. At a certain point in time, CPU1
updates jiffies, and then the interrupt is closed for some reason, then
jiffies will not be updated. At the same time, the clocksource watchdog of
CPU2 is activated, and still delayed for a period of time due to high load.
Since the jiffies is not updated, this judgment should fail at this time.
But I think it might be necessary to troubleshoot other problems, because
the interrupt should not closed for a long time. How do you think about
this scene.

Thanks.
>>>           /* Check the deviation from the watchdog clocksource. */
>>>           md = cs->uncertainty_margin + watchdog->uncertainty_margin;
>>>           if (abs(cs_nsec - wd_nsec) > md) {
>>> @@ -474,6 +480,7 @@ static void clocksource_watchdog(struct timer_list *unused)
>>>        * pair clocksource_stop_watchdog() clocksource_start_watchdog().
>>>        */
>>>       if (!timer_pending(&watchdog_timer)) {
>>> +        watchdog_start_time = jiffies;
>>>           watchdog_timer.expires += WATCHDOG_INTERVAL;
>>>           add_timer_on(&watchdog_timer, next_cpu);
>>>       }
>>> @@ -488,6 +495,7 @@ static inline void clocksource_start_watchdog(void)
>>>       timer_setup(&watchdog_timer, clocksource_watchdog, 0);
>>>       watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
>>>       add_timer_on(&watchdog_timer, cpumask_first(cpu_online_mask));
>>> +    watchdog_start_time = jiffies;
>>>       watchdog_running = 1;
>>>   }
>>>
>>>
>>>> thanks
>>>> -john
>>>>
>>
>> It looks good to me.
>> thanks.
>