RE: [E1000-devel] large packet loss take2 2.6.31.x

From: Allan, Bruce W
Date: Tue Nov 24 2009 - 10:57:49 EST




>-----Original Message-----
>From: Jarek Poplawski [mailto:jarkao2@xxxxxxxxx]
>Sent: Tuesday, November 24, 2009 3:20 AM
>To: Caleb Cushing
>Cc: e1000-devel@xxxxxxxxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Frans Pop;
>Brandeburg, Jesse; linux-kernel@xxxxxxxxxxxxxxx; Andi Kleen; Kirsher,
>Jeffrey T
>Subject: Re: [E1000-devel] large packet loss take2 2.6.31.x
>
>On Tue, Nov 24, 2009 at 01:17:09AM -0500, Caleb Cushing wrote:
>> > Btw, currently I don't consider this dropping means there has to be
>> > a bug. It could be otherwise - a feature... e.g. when a new kernel
>> > can transmit faster (then dropping in some other, slower place can
>> > happen).
>>
>> um... where would it be dropping that we wouldn't have a bug? I mean
>> sure faster is great... but if it makes my network not work right...
>
>E.g. if it were dropped because of a queue overflow (but it doesn't
>seem to be the case, at least at your box) or because of memory
>problems while handling a lot of traffic.
>
>>
>> I've added all (I think) information you've asked for to the bug
>> http://bugzilla.kernel.org/show_bug.cgi?id=13835 except for ethtool
>> and netstat on the router side. ethtool complains about not having
>> driver or capability (maybe because it's a 2.4 kernel?) and the
>> version of netstat doesn't support -s. I disabled everything that I
>> can think of that would send/receive packets before doing the test
>> client side, except dhcp/dns windows box's were probably sending some
>> broadcasts too. but the traffic should be pretty low. I did remember
>> to set the txqueuelen didn't seem to make a difference
>
>Alas it's not all information I asked. E.g. "netstat -s before faulty
>kernel" and "netstat -s after faulty kernel" seem to be the same file:
>netstat_after.slave4.log.gz. Anyway, since there are problems with
>getting stats from the router we still can't compare them, or check
>for the dropped stats. (Btw, could you check for /proc/net/softnet_stat
>yet?)
>
>So, it might be the kernel problem you reported, but there is not
>enough data to prove it. Then my proposal is to try to repeat this
>problem in more "testing friendly" conditions - preferably against
>some other, more up-to-date linux box, if possible?
>
>> only error in dmesg I see is
>>
>> e1000e 0000:00:19.0: pci_enable_pcie_error_reporting failed 0xfffffffb
>
>I added e1000e maintainers to CC to have a look at this warning.
>
>Jarek P.

The "pci_enable_pcie_error_reporting failed" message is a non-fatal warning that has recently been removed.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/