[PATCH] Clocksource: Avoid misjudgment of clocksource

From: yanghui
Date: Fri Oct 15 2021 - 02:57:09 EST




在 2021/10/14 下午3:03, yanghui 写道:


在 2021/10/12 下午4:06, brookxu 写道:


John Stultz wrote on 2021/10/12 13:29:
On Mon, Oct 11, 2021 at 10:23 PM brookxu <brookxu.cn@xxxxxxxxx> wrote:
John Stultz wrote on 2021/10/12 12:52 下午:
On Sat, Oct 9, 2021 at 7:04 AM brookxu <brookxu.cn@xxxxxxxxx> wrote:

hello

John Stultz wrote on 2021/10/9 7:45:
On Fri, Oct 8, 2021 at 1:03 AM yanghui <yanghui.def@xxxxxxxxxxxxx> wrote:

clocksource_watchdog is executed every WATCHDOG_INTERVAL(0.5s) by
Timer. But sometimes system is very busy and the Timer cannot be
executed in 0.5sec. For example,if clocksource_watchdog be executed
after 10sec, the calculated value of abs(cs_nsec - wd_nsec) will
be enlarged. Then the current clocksource will be misjudged as
unstable. So we add conditions to prevent the clocksource from
being misjudged.

Signed-off-by: yanghui <yanghui.def@xxxxxxxxxxxxx>
---
  kernel/time/clocksource.c | 6 +++++-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index b8a14d2fb5ba..d535beadcbc8 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -136,8 +136,10 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);

  /*
   * Interval: 0.5sec.
+ * MaxInterval: 1s.
   */
  #define WATCHDOG_INTERVAL (HZ >> 1)
+#define WATCHDOG_MAX_INTERVAL_NS (NSEC_PER_SEC)

  static void clocksource_watchdog_work(struct work_struct *work)
  {
@@ -404,7 +406,9 @@ static void clocksource_watchdog(struct timer_list *unused)

                 /* Check the deviation from the watchdog clocksource. */
                 md = cs->uncertainty_margin + watchdog->uncertainty_margin;
-               if (abs(cs_nsec - wd_nsec) > md) {
+               if ((abs(cs_nsec - wd_nsec) > md) &&
+                       cs_nsec < WATCHDOG_MAX_INTERVAL_NS &&

Sorry, it's been awhile since I looked at this code, but why are you
bounding the clocksource delta here?
It seems like if the clocksource being watched was very wrong (with a
delta larger than the MAX_INTERVAL_NS), we'd want to throw it out.

+                       wd_nsec < WATCHDOG_MAX_INTERVAL_NS) {

Bounding the watchdog interval on the check does seem reasonable.
Though one may want to keep track that if we are seeing too many of
these delayed watchdog checks we provide some feedback via dmesg.

For some fast timeout timers, such as acpi-timer, checking wd_nsec should not
make much sense, because when wacthdog is called, the timer may overflow many
times.

Indeed. But in that case we can't tell which way is up. This is what I
was fretting about when I said:
So I do worry these watchdog robustness fixes are papering over a
problem, pushing expectations closer to the edge of how far the system
should tolerate bad behavior. Because at some point we'll fall off. :)

If the timer is delayed long enough for the watchdog to wrap, we're
way out of tolerable behavior. There's not much we can do because we
can't even tell what happened.

But in the case where the watchdog has not wrapped, I don't see a
major issue with trying to be a bit more robust in the face of just
slightly delayed timers.
(And yes, we can't really distinguish between slightly delayed and
watchdog-wrap-interval + slight delay, but in either case we can
probably skip disqualifying the clocksource as we know something seems
off)

If we record the watchdog's start_time in clocksource_start_watchdog(), and then
when we verify cycles in clocksource_watchdog(), check whether the clocksource
watchdog is blocked. Due to MSB verification, if the blocked time is greater than
half of the watchdog timer max_cycles, then we can safely ignore the current
verification? Do you think this idea is okay?

I can't say I totally understand the idea. Maybe could you clarify with a patch?


Sorry, it looks almost as follows:

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index b8a14d2..87f3b67 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -119,6 +119,7 @@
  static DECLARE_WORK(watchdog_work, clocksource_watchdog_work);
  static DEFINE_SPINLOCK(watchdog_lock);
  static int watchdog_running;
+static unsigned long watchdog_start_time;
  static atomic_t watchdog_reset_pending;
  static inline void clocksource_watchdog_lock(unsigned long *flags)
@@ -356,6 +357,7 @@ static void clocksource_watchdog(struct timer_list *unused)
      int next_cpu, reset_pending;
      int64_t wd_nsec, cs_nsec;
      struct clocksource *cs;
+    unsigned long max_jiffies;
      u32 md;
      spin_lock(&watchdog_lock);
@@ -402,6 +404,10 @@ static void clocksource_watchdog(struct timer_list *unused)
          if (atomic_read(&watchdog_reset_pending))
              continue;
+        max_jiffies = nsecs_to_jiffies(cs->max_idle_ns);
+        if (time_is_before_jiffies(watchdog_start_time + max_jiffies))
+            continue;
+

Hi John:
What do you think of this suggest?If the interval between two executions of the function clocksource_watchdog() exceeds max_idle_ns. We think the current judement is unreasonable,so we skip this judgment.

          /* Check the deviation from the watchdog clocksource. */
          md = cs->uncertainty_margin + watchdog->uncertainty_margin;
          if (abs(cs_nsec - wd_nsec) > md) {
@@ -474,6 +480,7 @@ static void clocksource_watchdog(struct timer_list *unused)
       * pair clocksource_stop_watchdog() clocksource_start_watchdog().
       */
      if (!timer_pending(&watchdog_timer)) {
+        watchdog_start_time = jiffies;
          watchdog_timer.expires += WATCHDOG_INTERVAL;
          add_timer_on(&watchdog_timer, next_cpu);
      }
@@ -488,6 +495,7 @@ static inline void clocksource_start_watchdog(void)
      timer_setup(&watchdog_timer, clocksource_watchdog, 0);
      watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
      add_timer_on(&watchdog_timer, cpumask_first(cpu_online_mask));
+    watchdog_start_time = jiffies;
      watchdog_running = 1;
  }


thanks
-john


It looks good to me.
thanks.