Re: How to increat [sic.] max open files - some answers

Mark Hemment (markhe@nextd.demon.co.uk)
Fri, 3 Jan 1997 19:12:16 +0000 (GMT)


On Fri, 3 Jan 1997, Richard B. Johnson wrote:

Hi,

This is a little long winded, but bear with me and I'll try to reveal
all :)

Lets clear up one BIG misunderstanding;
FD_SETSIZE <= sysconf(_SC_OPEN_MAX)
Note the '<' part! (for those who write non-portable code, sysconf(...) =
NR_OPEN).

OK, under Linux FD_SETSIZE does equal the max num of open files. But try
it out on UnixWare/Solaries (and other SVR4 derived systems). XPG/4
states select() only supports fds upto FD_SETSIZE.

So how to you select on an fd greater than FD_SETSIZE?
With a simple programming trick. 'Reserve' some low-numbered fds when the
app starts (open() /dev/zero and dup() it a few times). Implement a
wrapper for select() which does a bit magic to 'map' high-numbered fds to
the reserved low ones, and then make the system call.
This works fine when the number of descriptors to be select()d is not
greater than FD_SETSIZE. If it is greater, then (on most UNIX OSes) the
poll() system call can be used (which has no limits).
There is no need for re-compiling libc when Linux supports a v. large
number of file descriptors - just re-coding of apps. (sorry, but you
would need to make the same changes if you wish your code to be portable).

For the kernel support....
As FD_SETSIZE needs to be divorced from NR_OPEN, changes need to take
place in 'struc files_struct', get_unused_fd(), put_used_fd(),
do_fork(), etc.
The current scheme is v. efficient, but will not scale well to large
sizes. The solution is probably to use the current scheme when the num of
open fds is <=FD_SETSIZE, and then dynamically allocate 'blocks' of
descriptors.
That should have little effect on small processes, at the expensive of
extra overhead for those that wish to use large numbers. A fair trade
off.
For a complete solution, the poll() sys-call is needed (it's required by
the latested XPG/4 spec anyway).

FYI: UnixWare has select(3C) as a library function, which maps onto
poll(2). This is bad for short timeouts, which forces programmers to use
poll().

Reagrds,

markhe

------------------------------------------------------------------
Mark Hemment, Unix/C Software Engineer (Contractor)
markhe@nextd.demon.co.uk
------------------------------------------------------------------