Re: 2.6.25.9: system clocks works normally then speeds up 4x...

From: Philippe Troin
Date: Wed Jul 09 2008 - 16:58:27 EST


john stultz <johnstul@xxxxxxxxxx> writes:

> On Wed, 2008-07-09 at 13:01 -0700, Philippe Troin wrote:
> > "john stultz" <johnstul@xxxxxxxxxx> writes:
> >
> > > On Wed, Jul 9, 2008 at 12:21 PM, Philippe Troin <phil@xxxxxxxx> wrote:
> > > >
> > > > Symptoms:
> > > >
> > > > The system boots fine. Clock seems to run normally.
> > > >
> > > > Then after a random amount of time (on the current boot, 3 days),
> > > > clock starts to be running 2-4x faster (on the current boot, 4x).
> > > >
> > > > I have tried booting with "nohz=off highres=off" but it does not
> > > > help.
> > >
> > > Could you provide the output from the following:
> > > sudo cat /sys/devices/system/clocksource/clocksource0/*
> >
> > Sure.
> >
> > It is:
> > available: jiffies tsc
> > current: jiffies
> >
> > > Did this issue occur with 2.6.24 or earlier kernels?
> >
> > No. It started with 2.6.25.
> >
> > Interestingly:
> >
> > I've just modified the current clocksource to tsc and the clock went
> > back to its normal speed.
> >
> > Then I reset the current clocksource to jiffies, and the clock went
> > back to its (wrong) 4x speed.
> >
> > So it looks like the kernel is counting jiffies 4x too fast.
>
> When you're seeing the issue, can you do the following:
> cat /proc/interrupts > interrupts
>
> <wait 10 seconds by your wristwatch>
>
> cat /proc/interrupts >> interrupts
>
> And send the results?

There you are:

CPU0 CPU1
0: 353 0 IO-APIC-edge timer
1: 0 8 IO-APIC-edge i8042
2: 0 0 XT-PIC-XT cascade
3: 0 2 IO-APIC-edge
4: 32796 68 IO-APIC-edge serial
8: 1 0 IO-APIC-edge rtc
14: 665397 37592 IO-APIC-edge pata_via
15: 0 0 IO-APIC-edge pata_via
16: 11417314 784937 IO-APIC-fasteoi ohci_hcd:usb2, aic7xxx,
firewire_ohci
17: 11695442 1165240 IO-APIC-fasteoi ohci_hcd:usb3, eth1
18: 14967468 1533627 IO-APIC-fasteoi ehci_hcd:usb1, eth0
19: 1526542 363432 IO-APIC-fasteoi uhci_hcd:usb4, eth2
NMI: 0 0 Non-maskable interrupts
LOC: 546305845 33155722 Local timer interrupts
RES: 4502087 5460357 Rescheduling interrupts
CAL: 816244 3856944 function call interrupts
TLB: 604097 1266758 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0

Roughly 10 seconds later:

CPU0 CPU1
0: 353 0 IO-APIC-edge timer
1: 0 8 IO-APIC-edge i8042
2: 0 0 XT-PIC-XT cascade
3: 0 2 IO-APIC-edge
4: 32796 68 IO-APIC-edge serial
8: 1 0 IO-APIC-edge rtc
14: 665481 37592 IO-APIC-edge pata_via
15: 0 0 IO-APIC-edge pata_via
16: 11417335 784937 IO-APIC-fasteoi ohci_hcd:usb2, aic7xxx,
firewire_ohci
17: 11695614 1165240 IO-APIC-fasteoi ohci_hcd:usb3, eth1
18: 14967672 1533627 IO-APIC-fasteoi ehci_hcd:usb1, eth0
19: 1526542 363432 IO-APIC-fasteoi uhci_hcd:usb4, eth2
NMI: 0 0 Non-maskable interrupts
LOC: 546361653 33156517 Local timer interrupts
RES: 4502100 5460379 Rescheduling interrupts
CAL: 816244 3856944 function call interrupts
TLB: 604097 1266758 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0

> Could you also try booting with noapic to see if that changes anything?

Sure. This will mean I will lose the "wedged" system. Is there
anything else that needs to be checked on it before I lose the broken
state?
Also keep in mind that the symptoms take a while to manifest
themselves (a few days typically).

Phil.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/