NOT being an expert in the Linux networking code, a few disinterested
- Maybe the evil IS in the queue layer, and others haven't noticed as
their ethernet performance isn't as stellar as yours. Do the errors
occur randomly, or only under high load?
- Is there any way of a) dumping the stack and freezing when the error
occurs, so as to analyze the state of the kernel that led to the error
(easy, just write it 8-), b) writing a special return code when this
occurs so that succeeding higher layers of network code can dump all
appropriate state (see answer to a above) c) disabling all except disk
interrupts and writing a kernel or entire machine core image to swap
space when this occurs?
Maybe I've missed some information on this thread, but the information
I've seen so far "Somewhere at or after reaching <vaguely defined state
x> my machine hangs" doesn't give a potential debugger much to go on.
How many printk()s have been added to the code so far in an attempt to
understand what's going on? Calls to a function to dump state? Rather
than wasting time arguing whether it's a problem or not, the affected
user should endeavour to provide as much information as possible. This
may involve kernel modifications, hired help, packet sniffers, in
circuit emulators, experimentation with different hardware, voodoo, dead
poultry and inconvenience to users. On the other hand, if he finds it
more cost effective to replace the offending hardware and/or OS, so be
My sincerest apologies if I've missed something relevant.