Re: Sun GEM PPC32 Bug?

From: Benjamin Herrenschmidt
Date: Tue Feb 08 2011 - 19:18:32 EST


On Tue, 2011-02-08 at 20:58 +0100, Andreas Schwab wrote:
> Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> writes:
>
> > What's your machine model (cat /proc/cpuinfo) and what do you do to
> > trigger the problem ? I'm trying to reproduce here and so far had
> > no success doing so.
>
> Just today I saw the same problem on my PowerMac G5, while sending a lot
> of data over LAN.

This isn't the same problem... this looks like a tx timeout. Or do you
have some previous messages you didn't paste indicating that it all
started with an RX overflow ? :-)

My main G5 has tg3's but I still have a crash box with sungem, I'll
hammer it with a cross-over see if I can make anything happen.

Cheers,
Ben.

> NETDEV WATCHDOG: eth0 (gem): transmit queue 0 timed out
> ------------[ cut here ]------------
> WARNING: at net/sched/sch_generic.c:256
> Modules linked in: usb_storage uas tcp_diag inet_diag firewire_sbp2 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device nfsd lockd exportfs auth_rpcgss nfs_acl sunrpc tun cpufreq_conservative cpufreq_userspace cpufreq_powersave nf_conntrack_ipv6 nf_defrag_ipv6 ip6t_REJECT ip6t_LOG ip6table_filter ip6_tables xt_TCPMSS xt_recent xt_state ipt_REJECT ipt_LOG xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables loop snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus snd_aoa_soundbus sg snd_pcm firewire_ohci snd_page_alloc firewire_core sr_mod snd_timer crc_itu_t uninorth_agp cdrom sungem sungem_phy snd agpgart soundcore linear sd_mod pata_macio dm_snapshot dm_mod sata_svw libata scsi_mod
> NIP: c00000000030ae50 LR: c00000000030ae4c CTR: 0000000000000001
> REGS: c00000000fff3a00 TRAP: 0700 Not tainted (2.6.38-rc3)
> MSR: 9000000000029032 <EE,ME,CE,IR,DR> CR: 48ffff84 XER: 20000000
> TASK = c00000017a0d28c0[0] 'swapper' THREAD: c00000017a0f0000 CPU: 1
> GPR00: c00000000030ae4c c00000000fff3c80 c00000000085e410 000000000000003e
> GPR04: 0000000000000001 c00000000004d6f0 0000000000000000 0000000000000001
> GPR08: 0000000000000000 c00000017a0d28c0 c00000000006eb04 0000000000000001
> GPR12: 7472616e736d6974 c00000000ffff780 c0000001778d4400 0000000000000001
> GPR16: 0000000000000000 c0000001778d4000 c00000017a119c60 0000000000000100
> GPR20: c000000000869280 c00000017a119060 c00000017a119460 0000000000000001
> GPR24: ffffffffffffffff c00000017a5f0780 0000000000000002 0000000000000001
> GPR28: 0000000000000000 c0000001778d43a0 c0000000007f7350 c0000001778d4000
> NIP [c00000000030ae50] .dev_watchdog+0x19c/0x2cc
> LR [c00000000030ae4c] .dev_watchdog+0x198/0x2cc
> Call Trace:
> [c00000000fff3c80] [c00000000030ae4c] .dev_watchdog+0x198/0x2cc (unreliable)
> [c00000000fff3d80] [c00000000005986c] .run_timer_softirq+0x1c4/0x264
> [c00000000fff3ec0] [c00000000005385c] .__do_softirq+0xe8/0x1c4
> [c00000000fff3f90] [c000000000017628] .call_do_softirq+0x14/0x24
> [c00000017a0f39b0] [c00000000000b2bc] .do_softirq+0x78/0xc4
> [c00000017a0f3a50] [c0000000000539f8] .irq_exit+0x4c/0x9c
> [c00000017a0f3ad0] [c000000000014704] .timer_interrupt+0xbc/0xd4
> [c00000017a0f3b60] [c000000000003c8c] decrementer_common+0x10c/0x180
> --- Exception: 901 at .cpu_idle+0x110/0x1d4
> LR = .cpu_idle+0x110/0x1d4
> [c00000017a0f3e50] [c0000000000108fc] .cpu_idle+0x64/0x1d4 (unreliable)
> [c00000017a0f3ee0] [c0000000003d22d0] .start_secondary+0x310/0x320
> [c00000017a0f3f90] [c0000000000072dc] .start_secondary_prolog+0x10/0x14
> Instruction dump:
> 41fe0040 38810070 7fe3fb78 38a00040 4bfea021 60000000 7fe4fb78 7f86e378
> 7c651b78 e87e8030 480c35cd 60000000 <0fe00000> e93e8018 38000001 98090008
> ---[ end trace cc84d3d8a2a0b1a7 ]---
> gem 0001:03:0f.0: eth0: transmit timed out, resetting
> gem 0001:03:0f.0: eth0: TX_STATE[003ffc05:00000001:0000001f]
> gem 0001:03:0f.0: eth0: RX_STATE[0100c805:00000001:00000021]
> gem 0001:03:0f.0: eth0: Link is up at 100 Mbps, full-duplex
> gem 0001:03:0f.0: eth0: Pause is enabled (rxfifo: 10240 off: 7168 on: 5632)
>
> The watchdog message happend only once, but the transmit timeouts
> recurred over the whole transfer.
>
> Andreas.
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/