Re: Sun GEM PPC32 Bug?

From: Benjamin Herrenschmidt
Date: Mon Feb 07 2011 - 00:35:00 EST


What's your machine model (cat /proc/cpuinfo) and what do you do to
trigger the problem ? I'm trying to reproduce here and so far had
no success doing so.

Cheers,
Ben.

On Sun, 2011-02-06 at 16:01 +0100, R. Herbst wrote:
> Am 06.02.2011 00:45, schrieb Benjamin Herrenschmidt:
> >
> >
> > Actually, the second one is trivial, just modify gem_rxmac_interrupt()
> > as follow:
> >
> > if (rxmac_stat & MAC_RXSTAT_OFLW) {
> > u32 smac = readl(gp->regs + MAC_SMACHINE);
> >
> > netdev_err(dev, "RX MAC fifo overflow smac[%08x]\n", smac);
> > gp->net_stats.rx_over_errors++;
> > gp->net_stats.rx_fifo_errors++;
> >
> > - ret = gem_rxmac_reset(gp);
> > + ret = 1;
> > }
> >
> > And tell us if that makes a difference.
> >
> > Cheers,
> > Ben.
> >
>
> Okay. I have made the change. The only difference is that:
>
> In /var/log/messages
> Feb 6 15:52:12 G4 kernel: gem 0002:20:0f.0: eth0: RX MAC fifo
> overflow smac[00810400]
> Feb 6 15:52:12 G4 kernel: gem 0002:20:0f.0: eth0: Link is up at 1000
> Mbps, full-duplex
> Feb 6 15:52:12 G4 kernel: gem 0002:20:0f.0: eth0: Pause is disabled
> Feb 6 15:57:10 G4 kernel: NETDEV WATCHDOG: eth0 (gem): transmit queue
> 0 timed out
> Feb 6 15:57:10 G4 kernel: ------------[ cut here ]------------
> Feb 6 15:57:10 G4 kernel: WARNING: at net/sched/sch_generic.c:258
> Feb 6 15:57:10 G4 kernel: Modules linked in: radeon ttm
> drm_kms_helper drm hwmon power_supply ipv6 snd_pcm_oss snd_mixer_oss
> snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
> snd_powermac snd_pcm snd_timer snd soundcore snd_page_alloc dm_mod
> uninorth_agp sungem agpgart sungem_phy
> Feb 6 15:57:10 G4 kernel: NIP: c03dceec LR: c03dceec CTR: 00000001
> Feb 6 15:57:10 G4 kernel: REGS: effefe20 TRAP: 0700 Not tainted
> (2.6.37-gentoo)
> Feb 6 15:57:10 G4 kernel: MSR: 00029032 <EE,ME,CE,IR,DR> CR:
> 44200084 XER: 20000000
> Feb 6 15:57:10 G4 kernel: TASK = ef854cb0[0] 'swapper' THREAD: ef878000 CPU: 1
> Feb 6 15:57:10 G4 kernel: GPR00: c03dceec effefed0 ef854cb0 0000003e
> 00001032 ffffffff c059f182 2074696d
> Feb 6 15:57:10 G4 kernel: GPR08: 000069f7 effee000 01ea1000 00000004
> ffffffff fff80b18 fff80154 00000000
> Feb 6 15:57:10 G4 kernel: GPR16: 00000420 c03dcd4c c0589084 00200200
> c04c9786 ef888814 ef888a14 ef888c14
> Feb 6 15:57:10 G4 kernel: GPR24: 00000001 ffffffff ef12e7a0 00000002
> 00000001 00000000 ef8141d4 ef814000
> Feb 6 15:57:10 G4 kernel: NIP [c03dceec] dev_watchdog+0x1a0/0x2e4
> Feb 6 15:57:10 G4 kernel: LR [c03dceec] dev_watchdog+0x1a0/0x2e4
> Feb 6 15:57:10 G4 kernel: Call Trace:
> Feb 6 15:57:10 G4 kernel: [effefed0] [c03dceec]
> dev_watchdog+0x1a0/0x2e4 (unreliable)
> Feb 6 15:57:10 G4 kernel: [effeff40] [c0043db4] run_timer_softirq+0x1ac/0x260
> Feb 6 15:57:10 G4 kernel: [effeffa0] [c003d9cc] __do_softirq+0x118/0x1ec
> Feb 6 15:57:10 G4 kernel: [effefff0] [c0011398] call_do_softirq+0x14/0x24
> Feb 6 15:57:10 G4 kernel: [ef879ea0] [c000687c] do_softirq+0x88/0xb4
> Feb 6 15:57:10 G4 kernel: [ef879ec0] [c003d178] irq_exit+0x54/0x74
> Feb 6 15:57:10 G4 kernel: [ef879ed0] [c000ead4] timer_interrupt+0x154/0x190
> Feb 6 15:57:10 G4 kernel: [ef879ee0] [c0012080] ret_from_except+0x0/0x14
> Feb 6 15:57:10 G4 kernel: --- Exception: 901 at cpu_idle+0xe0/0x180
> Feb 6 15:57:10 G4 kernel: LR = cpu_idle+0xd4/0x180
> Feb 6 15:57:10 G4 kernel: [ef879fa0] [c000a4f8] cpu_idle+0x170/0x180
> (unreliable)
> Feb 6 15:57:10 G4 kernel: [ef879fc0] [c044952c] start_secondary+0x314/0x350
> Feb 6 15:57:10 G4 kernel: [ef879ff0] [00003270] 0x3270
> Feb 6 15:57:10 G4 kernel: Instruction dump:
> Feb 6 15:57:10 G4 kernel: 2f800001 41be003c 38810008 7fe3fb78
> 38a00040 4bfe77c9 7fa6eb78 7fe4fb78
> Feb 6 15:57:10 G4 kernel: 7c651b78 3c60c050 3863ed12 48068721
> <0fe00000> 38000001 3d20c05c 9809d3bc
> Feb 6 15:57:10 G4 kernel: ---[ end trace 876ff0d47c88271d ]---
> Feb 6 15:57:10 G4 kernel: gem 0002:20:0f.0: eth0: transmit timed out, resetting
> Feb 6 15:57:10 G4 kernel: gem 0002:20:0f.0: eth0:
> TX_STATE[00000001:00000000:00000001]
> Feb 6 15:57:10 G4 kernel: gem 0002:20:0f.0: eth0:
> RX_STATE[0609441d:00000001:00000001]
> Feb 6 15:57:10 G4 kernel: gem 0002:20:0f.0: eth0: Link is up at 1000
> Mbps, full-duplex
> Feb 6 15:57:10 G4 kernel: gem 0002:20:0f.0: eth0: Pause is disabled
> ---
> It seems that the Network dies and halt for ca. 25 seconds. After a
> while it comes a call trace and the rsync session is dead. But not the
> hole system dies.
>
> Regards
> RÃdi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/