So you see outgoing SYN packets, but no SYN replies coming from the remote
peer ? (you mention ACKS, but the first packet received from the remote
peer should be a SYN+ACK),
Right, I meant to say SYN+ACK. I don't see them coming back.
When the problem comes, instead of restarting the application, please take a
tcpdump of say 10.000 packets.
Then turn off tcp_sack and take a 2nd tcpdump sample, and make both samples
available to us.
I can take these captures and take a look at the results.
Unfortunately, I don't think I'll be able to make the captures
available to the general public.
If turning off tcp_sack makes the problem go away, why dont you turn it off
all the time ?
Unfortunately, I think that will be the answer if I can't get any help
fixing this problem in the kernel. It's a bummer, because many of the
remote hosts my application communicates with are on wireless links,
so there may be performance implications to turning SACK off.