[PATCH 1/2] tick-sched: Do not clear the iowait and idle times

From: Tom Hromatka
Date: Wed Jun 10 2020 - 17:08:06 EST


A customer reported that when a cpu goes offline and then comes back
online, the overall cpu idle and iowait data in /proc/stat decreases.
This is wreaking havoc with their cpu usage calculations.

Prior to this patch:

user nice system idle iowait
cpu 1390748 636 209444 9802206 19598
cpu1 178384 75 24545 1392450 3025

take cpu1 offline and bring it back online

user nice system idle iowait
cpu 1391209 636 209682 8453440 16595
cpu1 178440 75 24572 627 0

To prevent this, do not clear the idle and iowait times for the
cpu that has come back online.

With this patch:

user nice system idle iowait
cpu 129913 17 17590 166512 704
cpu1 15916 3 2395 20989 47

take cpu1 offline and bring it back online

user nice system idle iowait
cpu 130089 17 17686 184625 711
cpu1 15942 3 2401 23088 47

Signed-off-by: Tom Hromatka <tom.hromatka@xxxxxxxxxx>
---
kernel/time/tick-sched.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3e2dc9b8858c..8103bad7bbd6 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1375,13 +1375,22 @@ void tick_setup_sched_timer(void)
void tick_cancel_sched_timer(int cpu)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ ktime_t idle_sleeptime, iowait_sleeptime;

# ifdef CONFIG_HIGH_RES_TIMERS
if (ts->sched_timer.base)
hrtimer_cancel(&ts->sched_timer);
# endif

+ /* save off and restore the idle_sleeptime and the iowait_sleeptime
+ * to avoid discontinuities and ensure that they are monotonically
+ * increasing
+ */
+ idle_sleeptime = ts->idle_sleeptime;
+ iowait_sleeptime = ts->iowait_sleeptime;
memset(ts, 0, sizeof(*ts));
+ ts->idle_sleeptime = idle_sleeptime;
+ ts->iowait_sleeptime = iowait_sleeptime;
}
#endif

--
2.25.3