Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

From: PaweÅ Sikora
Date: Tue Sep 25 2012 - 13:04:15 EST


On Tuesday 25 of September 2012 09:44:54 Greg KH wrote:
> On Tue, Sep 25, 2012 at 06:31:36PM +0200, PaweÅ Sikora wrote:
> > On Monday 24 of September 2012 10:36:33 Greg KH wrote:
> > > On Mon, Sep 24, 2012 at 10:05:23AM +0200, PaweÅ Sikora wrote:
> > > > Hi,
> > > >
> > > > with the new stable line i'm observing strange locks on my old amd-phenom-II mini-server.
> > > > here's a dmesg:
> > >
> > > Did this show up in 3.5.3? If not, can you run 'git bisect' to find the
> > > problem patch?
> >
> > heh, the old good kernel put some light on this issue.
> >
> > Sep 25 08:50:24 nexus kernel: [60330.301639] Clocksource tsc unstable (delta = -474690884 ns)
> > Sep 25 08:50:24 nexus kernel: [60330.325477] ------------[ cut here ]------------
> > Sep 25 08:50:24 nexus kernel: [60330.325484] WARNING: at /home/users/builder/rpm/BUILD/kernel-2.6.37.6/linux-2.6.37/net/sched/sch_generic.c:258 dev_watchdog+0x25d/0x270()
> > Sep 25 08:50:24 nexus kernel: [60330.325486] Hardware name: GA-MA785GMT-UD2H
> > Sep 25 08:50:24 nexus kernel: [60330.325487] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> > (...)
> > Sep 25 08:50:25 nexus kernel: [60330.851093] Switching to clocksource acpi_pm
> >
> > afaics, this amd-phenom cpu does the cpu frequency scaling and causes plain 'tsc' timer
> > instability which leads to network card watchdog timeout (i can login via local console
> > while any network traffic is dead). on the recent 3.5.x kernel the 'clocksource unstable'
> > message appears *after* 'task blocked' flood and there's no clear info about watchog timeout.
> > currently i'm testing hpet clocksource becasue better tsc modes (constant_tsc, nonstop_tsc)
> > aren't present in /sys/devices/system/clocksource/clocksource0/available_clocksource while
> > cpu supports them.
>
> I'm sorry, I don't understand, that's a 2.6.37 kernel you are comparing
> this to. Where did this problem show up? In 3.5.4 where 3.5.3 was
> fine?

'cpu-stall' from topic has appeared in 3.5.2 (after upgrade from 3.4.10).
the 3.5.4 also has the same problem as 3.5.2, so i've went back to initial 2.6.37.6
which had worked fine for many months. now i'm pretty sure that all these problems
are related to tsc instability and appears on different kernels in different form.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/