On 2002-10-15 13:55:17 +0200, Michal Kara wrote:
> Several times I had real problems with batch jobs failing with EAGAIN,
> printed as "Resource temporarily unavailable". Not with the failure, but to
> determine the real cause is really a pain. Usually, the problem is in
> resource limits (rlimit, set by ulimit), but the returned error code is
> misleading.
I've investigated the use of the fork() system call across some
programs in Debian GNU/Linux and I've found that the current fork
semantics are not very good.
When Linux can't fork() returns a fork error that the application
usually sees as a fatal error. Instead of the fatal error fork
should wait for resources to be available, thus never returning an
error. If the current user has exceeded one of its limits the program
should block waiting for another program of the same user to free
resources. There's a possible user deadlock in this approach that the
kernel should signal.
As no application waits for fork to be available most applications threat
a fork() error as fatal.
A possible fork interface would be waiting for resources to be available
and only returning with error in case of deadlock.
This needs to be fixed in the kernel as there's no interfaces in user
space to wait for fork() availability or signaling deadlocks.
This interface is also applicable to any other memory requesting system
call (like brk) that can return ENOMEM (instead of waiting for memory to
be available), or falling on deadlock.
As an example consider bash. In case of fork() error the program
isn't even run thus causing a fatal error. If fork() waited for
resources to be available there wouldn't be any problem.
Is there a syscall to wait for fork to be available instead of sleeping
an amount of time between fork()?
Attached is a bash patch that works but it's not optimal without a proper
kernel interface as it waits a fixed amount of time betteen fork()
attemps and can't detect deadlocks.
This archive was generated by hypermail 2b29 : Tue Oct 15 2002 - 22:00:55 EST