Re: 1e918876 breaks r8169 (linux-3.18+)

From: Tomas Szepe
Date: Sat Feb 21 2015 - 05:16:01 EST


> > > Since linux-3.18.0, r8169 is having problems driving one of my add-on
> > > PCIe NICs. The interface is losing link for several seconds at a time,
> > > the frequency being about once a minute when the traffic is high.
> > >
> > > The first loss of link is accompanied by the message "NETDEV WATCHDOG:
> > > eth1 (r8169): transmit queue 0 timed out" and a call trace, while
> > > subsequent occurrences only report "r8169 0000:01:00.0 eth1: link up"
> > > (w/o the complementary "link down" message).
> > >
> > > I've traced the culprit down to commit 1e918876, "r8169: add support
> > > for Byte Queue Limits" by Florian Westphal <fw@xxxxxxxxx>. Reverting
> > > the patch appears to fix the problem on linux-3.18.5.
> > > The same issue might already have been reported by Marco Berizzi here:
> > > http://lkml.org/lkml/2014/12/11/65
> >
> > Thanks for reporting this! I'm no lkml subscriber and thus did not
> > see earlier report.
> >
> > I'll try to reproduce this but unfortunately I am currently travelling
> > and won't have access to my r8169 nic until Feb 10th.
>
> I tried to reproduce this without success so far on my RTL8168d/8111d device.
> I've been running 40 parallel netperf TCP_STREAM tests (1gbit) for the
> last 5 hours and so far I saw no watchdog tx timeouts.
>
> I'll keep this running for a day or so to see if it just takes more time
> to trigger.

So, how's this coming along? Don't you think the patch should be reverted
until the problem is diagnosed/understood/fixed?

--
Tomas Szepe <szepe@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/