> On Mon, 30 Dec 1996, Mike Wangsmo wrote:
>
> > I have noticed the following error showing up in the log file lately. I
> > have seen it with 2.1.[17,18] for sure and possibly before them as well.
> > It has only happend sporadicaclly and is not something that I think I can
> > repeat predictably. There is usually 100's of copies of the error in the
> > messages log file, too.
> >
> > Dec 29 16:11:53 bridger kernel: TCPv4 bad checksum from 51fe0298:d221 to
> > 3228a3cd:0426, len=730/730/750
> >
> > Any thoughts?
> >
> > Thanks,
> > Mike
>
> I had a similar concern that it might have been something wierd with my
> connection. Someone, Alan Cox, I believe, responded and said it was
> basically a normal debugging message that's in use for the 2.1.x kernels,
> and it basically means that something got corrupted on its way to your
> machine.
>
> I got sick of it too, so I wrote a small and extremely simple patch. All
> it does is change two lines to comment out that part of the tcpv4 code.
> Obviously, I've been using it here with no problems and there's no reason
> why it wouldn't be safe, but the standard disclaimer applies.
>
[PATCH SNIPPED]
You know that Ethernet can't have any errors(?????) The CRC should guarantee
that, if it got through, it's perfect. So why do we have checksum errors??
Possible answers:
1 Data corruption between the Ethernet card (SNIC) and RAM.
2 Race condition where last data was not used before being partially
overwritten by new data.
3 Other bugs such as above.
4 Bad Ethernet card causing (1).
5 Bad Motherboard timing causing (1).
6 Other hardware problems causing (1).
In my work, I have used numerous Ethernet boards on PC/AT-type machines.
We make, amongst other things, a CAT-Scanner that uses SNICS (Serial Network
Interface Controllers) of the NE* type, for communications and control and
a high-speed data-link (no network, just Ethernet).... we ping-pong two
controllers at get about 18 megabytes/second continuous data transfer....
I have found that most all store-bought controllers are BAD. They don't
comply with the bus-interface timing specifications at all. It is suprising
that they work at all. We had to make our own from standard chips and
an interface PAL that provides the correct timing...
Given this, I suspect that the error messages are ___PREFECTLY_NORMAL___,
considering the junk that we are forced to purchase. The best Ethernet
card I have found was a 3COM 3C509. It isn't very fast but at least the
interface timing is correct. Some of the alleged 100 mb/s cards, i.e.,
more modern boards are terrible.
But... If one out of every 100 packets is errored due to poor interface
timing, that's 99% efficiency. If the board is a few percent faster, in
spite of its errors, the result might be a performance gain. This presumes
that the packet that has to be retransmitted comes from a compliant
IP interface that correctly keeps track of the packets. Some don't, so
you get a whole bunch of duplicates that have to be ACKed and thrown away.
Cheers,
Dick Johnson
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Richard B. Johnson
Project Engineer
Analogic Corporation
Voice : (508) 977-3000 ext. 3754
Fax : (508) 532-6097
Modem : (508) 977-6870
Ftp : ftp@boneserver.analogic.com
Email : rjohnson@analogic.com, johnson@analogic.com
Penguin : Linux version 2.1.16 on an i586 machine (66.15 BogoMips).
Warning : It's hard to remain at the trailing edge of technology.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-