Re: [kernel-sched-cputime] question about probable bug in cputime_adjust()

From: Gustavo A. R. Silva
Date: Wed Jun 28 2017 - 19:58:09 EST


Hi Frans,

Quoting Frans Klaver <fransklaver@xxxxxxxxx>:

On Wed, Jun 28, 2017 at 7:35 AM, Frans Klaver <fransklaver@xxxxxxxxx> wrote:
On Wed, Jun 28, 2017 at 1:03 AM, Gustavo A. R. Silva
<garsilva@xxxxxxxxxxxxxx> wrote:

Hello everybody,

While looking into Coverity ID 1371643 I ran into the following piece of
code at kernel/sched/cputime.c:568:

568/*
569 * Adjust tick based cputime random precision against scheduler runtime
570 * accounting.
571 *
572 * Tick based cputime accounting depend on random scheduling timeslices
of a
573 * task to be interrupted or not by the timer. Depending on these
574 * circumstances, the number of these interrupts may be over or
575 * under-optimistic, matching the real user and system cputime with a
variable
576 * precision.
577 *
578 * Fix this by scaling these tick based values against the total runtime
579 * accounted by the CFS scheduler.
580 *
581 * This code provides the following guarantees:
582 *
583 * stime + utime == rtime
584 * stime_i+1 >= stime_i, utime_i+1 >= utime_i
585 *
586 * Assuming that rtime_i+1 >= rtime_i.
587 */
588static void cputime_adjust(struct task_cputime *curr,
589 struct prev_cputime *prev,
590 u64 *ut, u64 *st)
591{
592 u64 rtime, stime, utime;
593 unsigned long flags;
594
595 /* Serialize concurrent callers such that we can honour our
guarantees */
596 raw_spin_lock_irqsave(&prev->lock, flags);
597 rtime = curr->sum_exec_runtime;
598
599 /*
600 * This is possible under two circumstances:
601 * - rtime isn't monotonic after all (a bug);
602 * - we got reordered by the lock.
603 *
604 * In both cases this acts as a filter such that the rest of the
code
605 * can assume it is monotonic regardless of anything else.
606 */
607 if (prev->stime + prev->utime >= rtime)
608 goto out;
609
610 stime = curr->stime;
611 utime = curr->utime;
612
613 /*
614 * If either stime or both stime and utime are 0, assume all
runtime is
615 * userspace. Once a task gets some ticks, the monotonicy code at
616 * 'update' will ensure things converge to the observed ratio.
617 */
618 if (stime == 0) {
619 utime = rtime;
620 goto update;
621 }
622
623 if (utime == 0) {
624 stime = rtime;
625 goto update;
626 }
627
628 stime = scale_stime(stime, rtime, stime + utime);
629
630update:
631 /*
632 * Make sure stime doesn't go backwards; this preserves
monotonicity
633 * for utime because rtime is monotonic.
634 *
635 * utime_i+1 = rtime_i+1 - stime_i
636 * = rtime_i+1 - (rtime_i - utime_i)
637 * = (rtime_i+1 - rtime_i) + utime_i
638 * >= utime_i
639 */
640 if (stime < prev->stime)
641 stime = prev->stime;
642 utime = rtime - stime;
643
644 /*
645 * Make sure utime doesn't go backwards; this still preserves
646 * monotonicity for stime, analogous argument to above.
647 */
648 if (utime < prev->utime) {
649 utime = prev->utime;
650 stime = rtime - utime;
651 }
652
653 prev->stime = stime;
654 prev->utime = utime;
655out:
656 *ut = prev->utime;
657 *st = prev->stime;
658 raw_spin_unlock_irqrestore(&prev->lock, flags);
659}


The issue here is that the value assigned to variable utime at line 619 is
overwritten at line 642, which would make such variable assignment useless.

It isn't completely useless, since utime is used in line 628 to calculate stime.

Oh, I missed 'goto update'. Never mind about this one.


But I'm suspicious that such assignment is actually correct and that line
642 should be included into the IF block at line 640. Something similar to
the following patch:

--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -637,9 +637,10 @@ static void cputime_adjust(struct task_cputime *curr,
* = (rtime_i+1 - rtime_i) + utime_i
* >= utime_i
*/
- if (stime < prev->stime)
+ if (stime < prev->stime) {
stime = prev->stime;
- utime = rtime - stime;
+ utime = rtime - stime;
+ }


If you confirm this, I will send a patch in a full and proper form.

I'd really appreciate your comments.

If you do that, how would you meet the guarantee made in line 583?


You are right, I see now.

Then in this case the following patch would be the way to go:

--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -615,10 +615,8 @@ static void cputime_adjust(struct task_cputime *curr,
* userspace. Once a task gets some ticks, the monotonicy code at
* 'update' will ensure things converge to the observed ratio.
*/
- if (stime == 0) {
- utime = rtime;
+ if (stime == 0)
goto update;
- }

if (utime == 0) {
stime = rtime;


but I think this one is even better:


--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -615,19 +615,11 @@ static void cputime_adjust(struct task_cputime *curr,
* userspace. Once a task gets some ticks, the monotonicy code at
* 'update' will ensure things converge to the observed ratio.
*/
- if (stime == 0) {
- utime = rtime;
- goto update;
- }
-
- if (utime == 0) {
+ if (stime != 0 && utime == 0)
stime = rtime;
- goto update;
- }
-
- stime = scale_stime(stime, rtime, stime + utime);
+ else
+ stime = scale_stime(stime, rtime, stime + utime);

-update:
/*
* Make sure stime doesn't go backwards; this preserves monotonicity
* for utime because rtime is monotonic.



What do you think?

Thank you!
--
Gustavo A. R. Silva