Re: Interesting interaction between lguest and CFS
From: Ingo Molnar
Date: Fri Jun 08 2007 - 05:34:57 EST
* Matt Mackall <mpm@xxxxxxxxxxx> wrote:
> On Tue, Jun 05, 2007 at 09:50:15PM +0200, Ingo Molnar wrote:
> > Matt, could you run this for 1-2 minutes and send us the sched_debug.txt
> > output?
>
> http://selenic.com/sched_debug.txt.gz
thanks. It shows the anomaly in action:
now at 365294215491977 nsecs
.jiffies : 91248553
.next_balance : 0
.curr->pid : 18589
.clock : 125652924079659272
.prev_clock_raw : 365201238127457
.clock_warps : 9
.clock_unstable_events : 61896358
.clock_max_delta : 3997813
next one is:
now at 365295219388142 nsecs
.jiffies : 91248804
.next_balance : 0
.curr->pid : 18591
.clock : 125653018059166371
.prev_clock_raw : 365295217642619
.clock_warps : 9
.clock_unstable_events : 61896359
.clock_max_delta : 92976502936
251 jiffies passed, at 250 Hz that's 1 second - this proves that the
sample is indeed an accurate once-per-second sample according to the
timer interrupt. The 'now' timestamp (ktime_get() based) shows
1003896165 nanosecs passed - this too is showing a precise 1 second
sample, according to GTOD.
So all the time references we have show that (no surprise here) 1 second
passed between the two samples. But sched_clock() shows a _large_ jump:
.clock : 125652924079659272
.clock : 125653018059166371
also reflected in .clock_max_delta:
.clock_max_delta : 92976502936
that's a 93 seconds jump (!) in a single 1-second sample. We also had a
single sched-clock-unstable event:
.clock_unstable_events : 61896358
.clock_unstable_events : 61896359
could you please try a test-boot with 'notsc' - do the scheduling
weirdnesses go away? Also,
There are two reasons why the sched_clock() in -mm could behave like
that - either the sched-clock-share patches in it are buggy and we do
not smoothly switch over from sc->unstable == 1 to sc->unstable == 0, or
the TSC itself is unstable. To test the latter theory, could you run a 1
minute tsc-dump on your box:
./tsc-dump > tsc-dump.txt
You can pick tsc-dump up from:
http://redhat.com/~mingo/time-warp-test/
(please run this on a recent -mm kernel so that we have the same
ACPI-idle characteristics as on the buggy kernel.)
to test the former theory, could you boot with 'notsc' - do the
weirdnesses go away? (please create another sched-debug.txt as well)
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/