[PATCH] watchdog: revert cleanup handling of false positives

From: Sergey Senozhatsky
Date: Mon May 17 2021 - 10:09:41 EST


This reverts commit 9bf3bc949f8aeefeacea4b1198db833b722a8e27.

I can reproduce the case when resumed VCPU starts to execute
is_softlockup() with PVCLOCK_GUEST_STOPPED set on this VCPU:

watchdog_timer_fn()
{
...

kvm_check_and_clear_guest_paused();

...

duration = is_softlockup(touch_ts, period_ts);
if (unlikely(duration)) {
....
}
}

Which means that guest VCPU has been suspended between
kvm_check_and_clear_guest_paused() and is_softlockup(),
and jiffies (clock) thus shifted forward.

VCPU can be preempted anywhere, so we want to do
kvm_check_and_clear_guest_paused() at the very last
moment: when we are in the soft-lockup branch but
then we figure out that it's false positive and we
need to bail out.

Signed-off-by: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>
---
kernel/watchdog.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7c397907d0e9..8cf0678378d2 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -376,14 +376,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
/* .. and repeat */
hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));

- /*
- * If a virtual machine is stopped by the host it can look to
- * the watchdog like a soft lockup. Check to see if the host
- * stopped the vm before we process the timestamps.
- */
- kvm_check_and_clear_guest_paused();
-
- /* Reset the interval when touched by known problematic code. */
+ /* Reset the interval when touched externally by a known slow code. */
if (period_ts == SOFTLOCKUP_DELAY_REPORT) {
if (unlikely(__this_cpu_read(softlockup_touch_sync))) {
/*
@@ -394,7 +387,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
sched_clock_tick();
}

+ /* Clear the guest paused flag on watchdog reset */
+ kvm_check_and_clear_guest_paused();
update_report_ts();
+
return HRTIMER_RESTART;
}

@@ -406,6 +402,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
*/
duration = is_softlockup(touch_ts, period_ts);
if (unlikely(duration)) {
+ /*
+ * If a virtual machine is stopped by the host it can look to
+ * the watchdog like a soft lockup, check to see if the host
+ * stopped the vm before we issue the warning
+ */
+ if (kvm_check_and_clear_guest_paused())
+ return HRTIMER_RESTART;
+
/*
* Prevent multiple soft-lockup reports if one cpu is already
* engaged in dumping all cpu back traces.
--
2.31.1.751.gd2f1c929bd-goog