Re: wrong data byte #8 should be 0x8 but was...

From: Jesper Juhl (jesper.juhl@dif.dk)
Date: Sat Feb 19 2000 - 18:12:14 EST


Ragnar Kjørstad wrote:
> I'm on thin ice here, but I think the error is caused by
> programs trying
> to access a long that is not aligned on a /8 memory-adress. I believe
> the kernel automagicly avoids the problem some how - and that the
> messages are just warnings telling you you should fix the program.
>
This sounds resonable. Any ideas about where to look for the cause of this
misalignment so that it could be permanently fixed?

> According to some documentation "wrong byte #8" is hardware problem in
> network card, switch or something. However, I find that highly
> improbably in our case because it happens lots of different machines
> (and in different nets).
>
A hardware problem also sounds highly unlikely to me, as I have 3 identical
DE500-BA cards in the machine, and they all give this error (the cards where
all purchased from different vendors, so something like 3 cards from the
same bad shipment can be probably be eliminated as the cause of this). Each
of the 3 cards is connected to different networks; one is connected to a
100mb/sec IBM switch, one to a 10mb/sec IBM hub and the last to a 10mb/sec
HP hub - so I don't think the cause of the problem is the equipment the
cards are connected to either.

> Once in a while all network activity stops, and if we try ping we get
> the "wrong byte #8" on all packets. The problem remains for a
> long time,
> unless we do (ifconfig eth0 down; ifconfig eth0 up) - then the problem
> stops. Tis leads me to believe the problem is software related.
>
I agree, I have not had any problems with this 'wrong data byte #8' problem
apart from the messages and one attempt to run the Squid 2.3STABLE1 proxy
server. The Squid compiles and installs just fine but when I start it up it
runs fine for about 2 to 3 minutes, and then the NIC's "shutdown" and I have
to do "ifconfig eth[012] down ; ifconfig eth[012] up" to get the cards
working again. All other software runs fine (at least so far - this server
has only been up and running (Linux) for a few days).

> My hunch is that some hardware problem cause the linux-kernel
> to get in
> the wrong "state", so that the problem remains until the interface is
> taken down and up again.
>
Maybe. But I had this server running Digital UNIX for about 6 months with
the same NIC's, and Digital UNIX never showed a simmilar problem. So it has
to have something to do with how Linux handles the hardware. Either Digital
UNIX contain a workaround for a known problem, or Linux handles the hardware
wrongly.

> Does anyone have suggestions for debugging I can do?
>
That's what I'd like to know as well. I'll be more than happy to test out
patches and provide any needed information!

---[ Jesper Juhl ]---

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Feb 23 2000 - 21:00:24 EST