Re: hware clock left bad after a system failure

From: Eyal Lebedinsky
Date: Wed Nov 16 2005 - 16:52:34 EST


linux-os (Dick Johnson) wrote:
> On Wed, 16 Nov 2005, Eyal Lebedinsky wrote:
>
>
>>I recently had two cases where my machine locked up and needed
>>a hard reset. The last time magic SysRq did not respond at all.
>>
>>In these cases I found that the hware clock was set incorrectly
>>and the machine comes up with a bad date. It seems that the clock
>>is ahead by as much as my TZ (+10 in my case). I may be able
>>to understand if it was set 10h behind (kernel set it to UTC)
>>but this is the other way. The machine comes up with UTC+20.
>>
>>Now this is just trouble. The machine comes up and spends 15m
>>fscking. I then reset the clock and reboot and it does the whole
>>fsck again because it thinks the fs was not checked for eons. It
>>does not understand time in the future.
>>
>>So the points are
>>
>>- why is the clock mangled in this way?
>
>
> I am assuming that you have an ix86 kind of machine.
>
> It's probably mangled because you had a hardware-crash.
>
> If you have a driver that accesses the RTC, it needs to leave
> the index register at offset 0 so that a hardware crash can
> only upset the seconds. Otherwise, even the RTC checksum
> can get screwed up, forcing manual reconfiguration of the
> BIOS.
>
> During a hardware-crash, the chip enables may go TRUE. This
> means that an RTC write can occur with junk that's on the
> data-bus.
>
> Now, you need to find out why you had a hardware-crash which
> is quite unlike a software-crash. A hardware crash occurs when
> you turn OFF the power or the power-good line from the
> power-supply goes FALSE. You do not get a hardware-crash from
> hitting the reset button. You may have induced the RTC failure
> if you hit the power switch instead of the reset button.

I hit the reset. In one case I managed to reboot using magic SysRq.

The crashes are related to disk problems. In one case the hda/b
controller went down (last message said DMA disabled on both)
and in another case the system was doing a proper shutdown
when it failed to complete and a reset was necessary. I suspect
a problem with the SATA card which I know has some driver issues
(promise SATA II 150 TX4).

The point of the post was the fact that the clock was not randomly
set but clearly at +20h after the reboot. This was the case in the
last two crashes. This is too coincidental and I suspect that some
logic does play with the RTC and if a proper shutdown does not
complete it may not be restored correctly.

--
Eyal Lebedinsky (eyal@xxxxxxxxxxxxxx) <http://samba.org/eyal/>
attach .zip as .dat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/