> SK> This oops has happened before with 2.1.122 as well. I believe it is an
> SK> actual bug in the TCP stack... This machine is one of our most
> SK> heavily-loaded webservers. It's still running, but it only spat out the
> SK> oops a few minutes ago. Anybody have any ideas?
>
> >>>EIP: c010ed83 <del_timer+13/3c>
>
> It seems that we are the only ones seeing it. And I dont see it unless I
> try hard. I considered possible memory corruption but this seems very
> unlikely now as you have it on exactly the same address.
>
> I tried to look at the code and found that something must corrupt the tcp
> probe timer - it's not NULL but it's not valid either.
>
> UP kernel on UP machine, tcp path mtu discovery turned off. I can reproduce
> it with only tcp timestamps off, maybe I would need higher load to
> reproduce it with timestamps on.
Is the other machine running 2.1.xxx too? Can you reproduce it when you
turn sacks off (via /proc/sys/net/ipv4/tcp_sack) ?
Why I am asking this?
In sock.h:
struct tcp_opt {
...
int num_sacks; /* Number of SACK blocks */
struct tcp_sack_block selective_acks[4]; /* The SACKS themselves*/
struct timer_list probe_timer; /* Probes */
...
If someone writes to selective_acks[4] you get a corrupted prev field in
probe_timer and the oops you reported.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/