Re: r8169 regression: UDP packets dropped intermittantly

From: Jonathan Woithe
Date: Mon Jan 15 2018 - 01:57:18 EST


On Wed, Dec 20, 2017 at 03:50:11PM +1030, Jonathan Woithe wrote:
> On Tue, Dec 19, 2017 at 01:25:23PM +0100, Michal Kubecek wrote:
> > On Tue, Dec 19, 2017 at 04:15:32PM +1030, Jonathan Woithe wrote:
> > > This clearly indicates that not every card using the r8169 driver is
> > > vulnerable to the problem. It also explains why Holger was unable to
> > > reproduce the result on his system: the PCIe cards do not appear to suffer
> > > from the problem. Most likely the PCI RTL-8169 chip is affected, but newer
> > > PCIe variations do not. However, obviously more testing will be required
> > > with a wider variety of cards if this inference is to hold up.
> >
> > The r8169 driver supports many slightly different variants of the chip.
> > To identify your variant more precisely, look for a line like
> >
> > r8169 0000:02:00.0 eth0: RTL8168evl/8111evl at 0xffffc90003135000, d4:3d:7e:2a:30:08, XID 0c900800 IRQ 38
> >
> > in kernel log.
>
> The PCIe card (the one which works correctly with the current driver) shows
> this:
>
> r8169 0000:02:00.0 eth0: RTL8168e/8111e at 0xf862e000, 80:1f:02:45:25:a4,
> XID 0c200000 IRQ 30
> r8169 0000:02:00.0 eth0: jumbo features [frames: 9200 bytes,
> tx checksumming: ko]
>
> The PCI card (Netgear GA311) which is affected by the problem shows this:
>
> r8169 0000:05:01.0 eth1: RTL8110s at 0xf8706800, e0:91:f5:1b:5f:c6,
> XID 04000000 IRQ 22
> r8169 0000:05:01.0 eth1: jumbo features [frames: 7152 bytes,
> tx checksumming: ok]
>
> The system which has shown the regressed behaviour is running a 32-bit
> kernel; for various reasons we can't move to a 64-bit kernel at present.
> However, I was able to boot this system using Slackware 14.2 install discs,
> and therefore test using both 32-bit and 64-bit 4.4.14 kernels. In both
> cases the fault was observed within 30 minutes of starting the tests when
> the GA311 card was in use. The fault is therefore not specific to 32-bit
> environments.

Is there any more information that can be provided (or tests done) to assist
in tracking this problem down? Based on the tests done in December it seems
that the problem only affects specific RTL-8169 variants, with most being
ok. Is it a case that we simply need to accept that for the greater good
commit da78dbff2e05630921c551dbbc70a4b7981a8fff has permanently broken
Netgear GA311 [1] network cards with respect to these UDP packets and that
nothing can be done?

Regards
jonathan

[1] Or perhaps any using the RTL8110s variant.