> write(17, "GET /images/topics/topicslashdot"..., 346) = -1 EAGAIN (Try again)
> over and over again.
> I see that write() is documented to return EAGAIN under some
> circumstances, but I've never seen an application get stuck like
> this. Could something in 2.1.129pre6 be responsible?
I've seen it since early 2.1.x (Maybe since the select->poll migration??)
It is probably some difference in poll/select handling, but I haven't tracked
it down yet.
Linux returns EAGAIN when the user is trying to write to an not-yet connected
socket and the socket is set to non blocking. The application is supposed to
check with poll/select first if the socket is writeable. Now if there is any
change between when writeability is signalled and the write succeeds that would
explain the bug - Now i've starred at the relevant code paths in both 2.0
and 2.1 extensively and didn't find a difference that could explain it. So it
might be some race.
Helpful would be:
- Others looking at the code, maybe more eyes find more. Starting points
are tcp.c:wait_for_tcp_connect and tcp.c:tcp_poll (in 2.1) and tcp_select
(in 2.0)
- Someone catching a tcpdump of such an accident (I suspect it has something
to do with asynchronous ICMP error handling, but that is just a theory)
- Someone catching a strace log of it happening _including_ the last system
calls before the endless loop.
Current work around is to use a proxy.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/