Re: Linux 2.4.20 RTC Timer bug

From: Willy Tarreau (willy@w.ods.org)
Date: Thu Jul 17 2003 - 14:17:06 EST


Hi Dick,

On Thu, Jul 17, 2003 at 07:25:39AM -0400, Richard B. Johnson wrote:
> On Wed, 16 Jul 2003, Willy Tarreau wrote:
>
> > Dick,
> >
> > 0x71 is the DATA register ! The thing that modifies what you read from it
> > is the RTC clock itself, because seconds are stored at index 0x00 IIRC, which
> > is often assumed if you read without writing.
> >
> > maybe it's time to go to bed ? :-)
> >
> > Cheers,
> > Willy
>
> It's so easy to kill the messenger instead of finding the problem.

My apologies Dick, I didn't notice your mention of the range (0 to 89) in your
original mail. I agree with you, it cannot (or at least, should not) be the
clock, so there's certainly something playing with it.

> Most modern RTC emulations will return 0xff when you read the index
> register at 0x70 because it's a write-only register. Therefore, to
> discover what it has been set to, one must read the data register at
> 0x71. If it increments at one second intervals from 0 to 59 (BCD) ,
> (you change the "%d" to "%x" to read BCD within that range), then
> the index register has left at 0. This is okay except that the
> time may get trashed upon power off.

I agree. This reminds me of two (broken ?) clocks I encountered about 4 years
ago. One of them would increment hours up to 99 if you set it by hand to
something bigger than 23, and after that, it stuck to 99. This clearly shows
the event-driven mechanism which jumps to zero if it changes to 24 ! The other
one was funnier : it would cycle into the tens you initialize it (it could only
increment tens from 0 to 1 then 2). So if you initialized it to 35, it could
run forever from 30 to 39, then 30. And if you set it to 25, it would run up
to 29, then jump to 20 and get back to something normal.

I don't remember if I played with seconds, though.

> In machines tested here, running linux-2.4.20, the value read from
> 0x71 increments from 0 to 99 with a few missing codes in-between so
> it's not possible to guess what it's been set to, maybe the
> 'B' register (status), then something else. That something else
> is the killer.

just a silly question : have you tried within vmware, or on other hardware ?

> When the power fails, most all systems running Linux will fail to
> restart because of CMOS corruption. You can easily check. Run linux,
> `init 1`, dismount drives, then pull the plug. Don't use the
> front panel power switch because, again, modern power supplies
> protect devices during 'normal' shutdown by using the reset
> circuitry.

I never noticed, but will probably try harder. I'm interested in such problems
because I use PC-based semi-embedded boxes at work. If this is the case, it
clearly shows that Linux continually modifies checksummed portions of the CMOS.

BTW, while I was playing with the hours>24, I noticed that both DOS and
Windows95 got a divide error during boot under such condition. That's what lead
me to the real problem in fact, because only Linux booted OK and I was
beginning scratching my head a lot.

Cheers,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jul 23 2003 - 22:00:30 EST