Re: Regression in 2.6.27 caused by commit bfc0f59

From: Alok Kataria
Date: Tue Sep 02 2008 - 14:41:59 EST


On Tue, 2008-09-02 at 11:14 -0700, Thomas Gleixner wrote:
> On Tue, 2 Sep 2008, Linus Torvalds wrote:
> > > So what I'm working on is an algorithm, which is similar to the checks
> > > in the tsc_read_refs() function. That should allow us to detect
> > > whether one of the reads is way off by doing a min/max detection. In
> > > such a case we can either repeat the calibration or try to figure out
> > > whether the pmtimer / hpet can provide us with some useful reference.
> >
> > I think the most trivial approach would be to
> >
> > - just keep track of the max TSC difference for each loop iteration.
> >
> > - if the max TSC is bigger than 1% of the total TSC, then something is
> > already seriously wrong (either we had very few loops indeed, or some
> > of them were very expensive)
>
> I went for summing up the deltas and build an average at the
> end. That's from a loop of 10 consecutive runs:
>
> [ 0.000000] TSC min 2160 max 3732 avg 3266 pitcnt 30614
> [ 0.000000] TSC min 2160 max 1036164 avg 3299 pitcnt 30310
> [ 0.000000] TSC min 2160 max 1032360 avg 3303 pitcnt 30277
>
> [ 0.000000] TSC min 2160 max 210453018 avg 69509 pitcnt 30260
>
> Hit very late in the loop, as pitcnt is close to the others
>
> [ 0.000000] TSC min 2160 max 3708 avg 3265 pitcnt 30624
> [ 0.000000] TSC min 2160 max 3720 avg 3265 pitcnt 30622
> [ 0.000000] TSC min 2160 max 1062252 avg 3301 pitcnt 30287
> [ 0.000000] TSC min 2160 max 3756 avg 3267 pitcnt 30605
> [ 0.000000] TSC min 2160 max 3732 avg 3267 pitcnt 30605
> [ 0.000000] TSC min 2136 max 989292 avg 3297 pitcnt 30324
> [ 0.000000] TSC min 2136 max 3744 avg 3266 pitcnt 30612
>
> [ 0.000000] TSC min 2160 max 78042006 avg 78045 pitcnt 1001
>
> This one hit early in the loop as pitcnt is pretty low.
>
> The min value is pretty constant.
>
> The max value for sane loops is in the range of 3708 - 3756, the
> average is between 3266 and 3267.
>
> For those which have a ~500us maximum the average is still in a sane
> range. That seems to be a single glitch, which pushs the maximum, but
> does not really influence the average result.
>
> The outstanding one is the 100ms (210 453 018 ticks), where the average
> is also off by factor 20.
>
> I think that information is enough to give us a pretty precice idea
> when to discard the result. I'm currently looking at the hpet/pmtimer
> values for comparison and I should have a patch for testing ready
> later tonight.
>
Sorry for joining the party this late...am still going through all my
mails.

Ok, so from what I understand until now, we will calibrate TSC against
PIT as was done in 32bit code and use that as default. If that fails to
give any sane results we will fall back to calibrating against PM_timer
or HPET ?
Thomas has already explained the problem with 32bit calibration ( i.e.
just against PIT and no checks for SMI's and all) but would like to
point that this problem is lot more worse in virtualized environment,
because we may fail to get sane values even from multiple loops of
calibrating against PIT.
If we have a fall back mechanism to detect this SMI event, and then try
calibrating against PM timer or HPET we should be good.

Anyways I will wait to see the patch.

Thanks,
Alok



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/