Likely cause of EAGAIN in connect() in 2.2.8??

David Luyer (luyer@ucs.uwa.edu.au)
Wed, 12 May 1999 21:21:06 +0800


The EAGAIN error code is not a documented return code from connect() in the
manual pages on my Debian Linux 2.1 system. Either this is a documentation
problem or a kernel problem in that it is ever returned, and if it is simply
a documentation problem, I would still like to know how to prevent my program
from getting it.

(and I know it's not a library issue - I don't use any libraries :-)

The situation: very small optimized network scanner (8k/process memory image,
"statically linked"[1], when I start working on it again I hope to get this
down to 4kb as it is actually using around 2kb and the rest is since I can't
get it to use that on less than 2 pages), forks and does ~250 non-blocking
connects per process and then select()s on them. The only known kernel it
works with is 2.0.36, due to route cache bugs in earlier versions and this
EAGAIN issue I'm hitting now in 2.2.8.

Here's a sample from strace (after a stream of successful connect()s):

[pid 798] connect(23, {sin_family=AF_INET, sin_port=htons(25),
sin_addr=inet_addr("130.95.5.23")}, 16) = -1 EAGAIN
(Resource temporarily unavailable)

I've increased /proc/sys/fs/{inode-max,file-max} to the values I had to under
2.0.36 (32768 and 16384) and it's not getting close (it hits around 6000 of
each) since it bombs out early with this problem. dmesg reports nothing out
of the ordinary.

[and before anyone suggests I sound simply pump out pre-constructed SYN's,
there's a mode in my software which does that too (and discovers interesting
timing problems which can actually make it slower than this method in many
cases), but it was running into another problem which seemed to be a network
driver bug on the specific hardware[2] I have to use this on so I abandoned
tracking down the relevant problems[3]]

David.

[1] statically linked, well, it isn't actually linked against anything.
it uses the _syscall* from the Linux include files to do all the
work.

[2] a Xircom CEM33 card - I'd already fixed one serious problem with the
driver, but when I tried to do a real clean up of the code more, it
didn't work, and I abandoned it.

[3] I suspect I was having problems due to the low resolution which resulted
since most 2.0.x delays use jiffies; 65,000 times one jiffie is intolerably
long, with no delay it's too damn fast for the campus network, and getting
something in between isn't so easy.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/