FYI: AIX SIGDANGER (was: Kernel 2.2.14 OOM killer strikes.)

From: Rodrigo Barbosa (rodrigob@liveware.com.br)
Date: Fri Jul 07 2000 - 12:32:47 EST


On Fri, Jul 07, 2000 at 11:10:48AM -0600, Andreas Dilger wrote:
> Claudio Martins writes:
> > [re SIGDANGER being sent when we are nearly at OOM]
> > The right thing? I don't think so. This doesn't fix anything. Processes
> > will still have to exit!
>
> Not necessarily, since SIGDANGER gives programs with handlers a chance to
> free up some memory before the situation becomes critical. For example,
> Netscape has a 16MB memory cache on my system that it can free at the
> drop of a hat, if it had a chance to do so. Xterms, the X server, mpg123,
> and many other programs have buffers that are not critical to their
> operation that can be freed.
>
> > How can this be tolerated in a multiuser environement? Someone starts a
> > memory hog process and the other users in the system read in their terminal:
> >
> > got SIGDANGER signal: exiting...
>
> It isn't SIGDANGER that kills a process (since the signal is ignored by
> default), but rather the kernel uses the information that a program with
> a SIGDANGER handler are "more important" than ones without a handler.
> It is the OOM killer that kills the processes, just like now - but
> SIGDANGER at least gives you a chance to help out, either by freeing
> memory and/or by doing a graceful shutdown/saving files before being killed.
>
> If the memory hog process does not have a SIGDANGER handler, then it
> will be killed before init/syslog/etc if they DO have a SIGDANGER handler.
> If there is truly a malicious user, there are many bad things they can do,
> and OOM is only one of them.
>

Just adding some more information about AIX SIGDANGER:

The system monitors the number of free paging space blocks and detects when a
paging-space shortage exists. When the number of free paging-space blocks falls
below a threshold known as the paging-space warning level, the system informs
all processes (except kprocs) of this condition by sending the SIGDANGER
signal. If the shortage continues and falls below a second threshold known as
the paging-space kill level, the system sends the SIGKILL signal to processes
that are the major users of paging space and that do not have a signal handler
for the SIGDANGER signal (the default action for the SIGDANGER signal is to
ignore the signal). The system continues sending SIGKILL signals until the
number of free paging-space blocks is above the paging-space kill level.

(...)

It is possible to overcommit resources when using the late allocation algorithm
for paging space allocation. In this case, when one process gets the resource
before another, a failure results. The operating system attempts to avoid
complete system failure by killing processes affected by the resource
overcommitment. The SIGDANGER signal is sent to notify processes that the
amount of free paging space is low. If the paging space situation reaches an
even more critical state, selected processes that did not receive the SIGDANGER
signal are sent a SIGKILL signal.

-- 
Rodrigo Barbosa - Network Administrator 
Liveware, Tecnologia a Servico LTDA - (+55)(35)471-3210
finger rodrigob@morcego.liveware.com.br for PGP Key and more info
"Artificial Intelligence stands no chance against Natural Stupidity."


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jul 07 2000 - 21:00:21 EST