I've been running 2.0.33+tcpdebug for 5 days until it crashed (sorry, I
was in a hurry so I just rebooted the machine without inspecting it further).
The machine locked up completely and filled the console with
messages like:
.... couldn't get a free skbuff ...
.... couldn't get a free page ...
There was no output from the debug-skbuff in the logs.
Again, I *really* don't think that there's a hardware-related problem,
2.0.31 and previous versions had uptimes > a month and _never_ had a
problem.
I'm running 2.0.33+tcpdebug for 1 day now, when it stops again you'll
hear from me with a detailled report. And I hope it locks up again
soon. But since I've found nothing in the logs after the previous
crash, I'm not sure if it helps...
What about this:
----------------
Let's assume that at least my problem here is related to a defective
skbuff list caused by some other kernel-code, maybe not even the networking-
code. What about adding some kind of CRC to each skbuff head and walking
down the whole list upon free_skb()/alloc_skb() (is it a list and is this
possible?) and possibly on other frequently called places in the kernel,
bringing the system to an immediate halt if a CRC doesn't match and
displaying as much information as possible?? Or is there a better place
doing this kind of checking?
Does this make any sense? I'd love to test it. :)
> ..3x kernels, also has the .33 hang problem on multiple machines. So, the
> next question is, how many people that have been having the
> hang/reboot/general blow up and die problems also had memory leaks under
> earlier 2.0.3x kernels?
No, at least I didn't recognize. The machine I'm talking about is
idle most of the time, so maybe this is the reason.
pm
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu