Re: Interesting interaction between lguest and CFS

From: Ingo Molnar
Date: Tue Jun 05 2007 - 15:50:57 EST



* Dmitry Adamushko <dmitry.adamushko@xxxxxxxxx> wrote:

> On 05/06/07, Dmitry Adamushko <dmitry.adamushko@xxxxxxxxx> wrote:
> >> now at 257428593818894 nsecs
> >>
> >> cpu: 0
> >> .nr_running : 3
> >> .raw_weighted_load : 2063
> >> .nr_switches : 242830075
> >> .nr_load_updates : 30172063
> >> .nr_uninterruptible : 0
> >> .jiffies : 64282148
> >> .next_balance : 0
> >> .curr->pid : 27182
> >> .clock : 125650217819008823
> >> .prev_clock_raw : 257428516403535
> >
> >The delta (clock - prev_clock_raw) looks insane.
> >
>
> err.. please ignore it. 'clock' and 'prev_clock_raw' follow each other
> only as long as the 'clock_warps' == 0.. otherwise 'delta' just
> doesn't make any sense. And in your case:
>
> .clock_warps : 9

yeah. Look at the sched_clock() reliability counts:

.clock_warps : 9
.clock_unstable_events : 41133344
.clock_max_delta : 3954619

i.e. a quite large 'tsc unstable events' counter and with a handful
negative warps detected, the actual maximum TSC delta was 3954 usecs
(clock_max_delta), so i think rq_clock() behaved sane and sched_clock()
has good quality on Matt's box. So the large block_max is still
mysterious to me:

> > sleep_max : 57476665627
> > block_max : 18014060106626075

because the whole rq_clock() logic is about avoiding large jumps. So i
dont see how 'now' could _ever_ have ended up being so very large. Even
if the new sched-clock code in arch/i386/kernel/sched-clock.c frequently
switches between sc->unstable == 0 or 1, (i.e. frequently switches
between jiffies based and TSC based clock), the rq_clock() logic should
cushion off any jumpiness or small warps from that. I'm wondering
whether this line:

r = (jiffies_64 - sc->sync_base) * (1000000000 / HZ);

could produce funnies. But even if it does, it could at most produce a
very large _positive_ jump - which they would have to show up in
clock_max_delta. Negative jumps are completely ignored by rq_clock().

although ... the 'raw' rq_clock() values are very large:

.clock : 125650217819008823

$ printf "%016Lx\n" 125650217819008823
01be6635170d6f37

which could be due to some TSC corruption? So perhaps somewhere we
calculated block_max with 'now' zero, hence resulting in that very large
value? Still, i dont see how that could happen. Weird.

it would be nice to monitor how 'clock' changes in time on that box. It
should be advancing monotonically and at most with the speed of the CPU
clock. (but it can move slower, over C3/unstable/cpufreq periods.) This
could be done the following way:

while sleep 1; do cat /proc/sched_debug >> sched_debug.txt; done

Matt, could you run this for 1-2 minutes and send us the sched_debug.txt
output?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/