Re: [forcedeth bug] Re: [GIT] Networking

From: Neil Horman
Date: Fri Aug 05 2011 - 07:12:58 EST


On Fri, Aug 05, 2011 at 12:29:03PM +0200, Ingo Molnar wrote:
>
> * Jiri Pirko <jpirko@xxxxxxxxxx> wrote:
>
> > Thu, Aug 04, 2011 at 11:53:54PM CEST, mingo@xxxxxxx wrote:
> > >
> > >* Ingo Molnar <mingo@xxxxxxx> wrote:
> > >
> > >> 0891b0e08937: forcedeth: fix vlans
> > >
> > >Hm, forcedeth is still giving me trouble even on latest -git that has
> > >the above fix included.
> > >
> > >The symptom is a stuck interface, no packets in. There's a frame
> > >error RX packet:
> > >
> > > [root@mercury ~]# ifconfig eth0
> > > eth0 Link encap:Ethernet HWaddr 00:13:D4:DC:41:12
> > > inet addr:10.0.1.13 Bcast:10.0.1.255 Mask:255.255.255.0
> > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> > > RX packets:0 errors:1 dropped:0 overruns:0 frame:1
> > > TX packets:531 errors:0 dropped:0 overruns:0 carrier:0
> > > collisions:0 txqueuelen:1000
> > > RX bytes:0 (0.0 b) TX bytes:34112 (33.3 KiB)
> > > Interrupt:35
> > >
> > >Weirdly enough a defconfig x86 bootup works just fine - it's certain
> > >.config combinations that trigger the bug. I've attached such a
> > >config.
> > >
> > >Note that at least once i've observed a seemingly good kernel going
> > >'bad' after a couple of minutes uptime. I've also observed
> > >intermittent behavior - apparent lost packets and a laggy network.
> > >
> > >I have done 3 failed attempts to bisect it any further - i got to the
> > >commit that got fixed by:
> > >
> > > 0891b0e08937: forcedeth: fix vlans
> > >
> > >... but that's something we already knew.
> > >
> > >Let me know if there's any data i can provide to help debug this
> > >problem.
> > >
> > >Thanks,
> > >
> > > Ingo
> >
> > Interesting.
> >
> > Is DEV_HAS_VLAN set in id->driver_data (L5344) ?
>
Looks like you can match it to pci id. Device ids 0x0372 and 0x0373 look to
have the flag set

> How do i tell that without hacking the driver?
>
> > If so, would you try to disable both rx an tx vlan accel using
> > ethtool and see if it helps?
>
> Should i do that when the device is in a stuck state and see whether
> it recovers?
>
> Also, please provide the exact ethtool command sequences i should
> try, this makes it easier for me to test exactly what you want me to
> test.
>
should be:
ethtool -K ethX rxvlan off txvlan off

I'm just poking about, but If I had to guess it looks like the card you have
ingo is an older forcedeth and uses the older format ring descriptor (I base
this on the fact that the rx error count noted above only gets incremented ni
nv_rx_process, but not nv_rx_process_optimized. Both paths should support hw
vlan acceleration though and Jiris fixes for vlan hw rx acceleration were only
applied to the optimized path.

Neil

> Thanks,
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/