Re: 2.1.102 and APM -- is the patch correct?

C. Scott Ananian (cananian@lcs.mit.edu)
Fri, 15 May 1998 05:43:51 -0400 (EDT)


On Fri, 15 May 1998, Linus Torvalds wrote:

> On Fri, 15 May 1998, C. Scott Ananian wrote:

> > I've got a laptop with APM support running Linux. What's a good way to
> > torture test it with getfasttimeoffset? I'm looking for a divide-by-zero
> > oops, right?
>
> The more I think about it, the more I have a little voice saying "suspend
> to disk, suspend to disk".
>
> I don't know how suspend to disk works, but I suspect it may be turning
> off the whole CPU, and in that case the cycle counter will certainly be a
> goner when we resume, unless some CPU engineer has come up with something
> really revolutionary.

Hmm. We can't restore the top 32-bits of the cycle counter, can we...
But we need to have the machine running for > 2^32 clock cycles before
the top-word trashing becomes evident. Hm. That's only 21 seconds on my
200Mhz machine (200Mhz = 200 million cycle counter increments per second,
right?). After the suspend to disk, the timeoffset will speed *way* up
(apparent kernel time just got truncated to something under 21 seconds, so
the computed 'average internal clocks per jiffy' figure plummets)..
*but* it's hard-limited at the end of do_fast_gettimeoffset to
USECS_PER_JIFFY, and the system time will get corrected every jiffy.
So only intra-jiffy times will be screwed up...
[Granted, I haven't lived in this code long, so the above analysis might
be all wrong.]

You would test this by calling gettimeofday() repeatedly in a tight loop.
If the cycle counter has been broken (truncated) then the difference
between two return values of gettimeofday will be exactly 0 for the
majority of the jiffy.

I'm guessing the kernel oopsen are due to some kernel code somewhere
dividing by the difference in gettimeofday() results? If so, this should
be fixed, because even a small rate estimation error, coupled with the
hard limiting at the end of do_fast_gettimeoffset, could result in zero
apparent time differences near the tail end of a jiffy. If will be rare
for non-broken TSCs, but still remotely possible.

<tangent>
I wish I could check this out myself, but my laptop has always hung under
linux whenever I've tried suspend-to-disk. I started hacking the apm code
a couple months ago to try to track this down, with no luck. [I did get
APM 1.2 implemented in the kernel, though. ;-) ] I had gotten reports that
the BSD apm code worked in cases where the linux code didn't, so I even
scanned line-by-line through the BSD source looking for apm implementation
differences... without any luck. And of course, it *works* in Windows.
Argh!

One of these days I'm going to properly instrument my machine and
single-step through the APM code to track this bugger down...
</tangent>
--Scott
@ @
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-oOO-(_)-OOo-=-=-=-=-=
C. Scott Ananian: cananian@lcs.mit.edu / Declare the Truth boldly and
Laboratory for Computer Science/Crypto / without hindrance.
Massachusetts Institute of Technology /META-PARRESIAS AKOLUTOS:Acts 28:31
-.-. .-.. .. ..-. ..-. --- .-. -.. ... -.-. --- - - .- -. .- -. .. .- -.
PGP key available via finger and from http://www.pdos.lcs.mit.edu/~cananian

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu