Re: [RESEND PATCH 1/2] tick-sched: Do not clear the iowait and idle times

From: Thomas Gleixner
Date: Sun Sep 13 2020 - 17:27:26 EST


Tom,

On Wed, Sep 09 2020 at 08:41, Tom Hromatka wrote:
> A customer reported that when a cpu goes offline and then comes back
> online, the overall cpu idle and iowait data in /proc/stat decreases.
> This is wreaking havoc with their cpu usage calculations.

for a changelog it's pretty irrelevant whether a customer reported
something or not.

Fact is that this happens and you fail to explain WHY it happens,
i.e. because the values are cleared when the CPU goes down and therefore
the accounting starts over from 0 when the CPU comes online again.

Describing this is much more useful than showing random numbers before
and after.

> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -1375,13 +1375,22 @@ void tick_setup_sched_timer(void)
> void tick_cancel_sched_timer(int cpu)
> {
> struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> + ktime_t idle_sleeptime, iowait_sleeptime;
>
> # ifdef CONFIG_HIGH_RES_TIMERS
> if (ts->sched_timer.base)
> hrtimer_cancel(&ts->sched_timer);
> # endif
>
> + /* save off and restore the idle_sleeptime and the iowait_sleeptime
> + * to avoid discontinuities and ensure that they are monotonically
> + * increasing
> + */

/*
* Please use sane multiline comment style and not the above
* abomination.
*/

Also please explain what this 'monotonically increasing' thing is
about. Without consulting the changelog it's hard to figure out what
that means.

Comments are valuable but only when they make actually sense on
their own. Something like the below perhaps?

/*
* Preserve idle and iowait sleep times accross a CPU offline/online
* sequence as they are accumulative.
*/

Hmm?

> + idle_sleeptime = ts->idle_sleeptime;
> + iowait_sleeptime = ts->iowait_sleeptime;
> memset(ts, 0, sizeof(*ts));
> + ts->idle_sleeptime = idle_sleeptime;
> + ts->iowait_sleeptime = iowait_sleeptime;
> }

Thanks,

tglx