Re: Avoiding OOM on overcommit...?

From: Olaf Weber (olaf@infovore.xs4all.nl)
Date: Fri Mar 24 2000 - 15:24:16 EST


Marco Colombo writes:
> On 22 Mar 2000, Olaf Weber wrote:
>> Marco Colombo writes:

>>> malloc() "allocates" memory only on its own opaque pool. It has no
>>> OS meaning.

>> Correct. But the C standard says that it has certain properties (and
>> implies some others). Basically, if I write a program and get reports

> Pardon me, a standard either "says" something, or not. You may imply
> something while reading it.

\nitpick{I don't \emph{imply} things when reading the standard, I
\emph{infer} them.}

If you visit comp.std.c on a regular basis, you'll find that there
regular arguments about the exact meaning of the C standard, which can
hinge of things that the text implies without stating them explicitly.

>> that it dies with a SIGBUS in some circumstances, I go on a bug-hunt
>> for the use of misaligned addresses and such. I would be very unhappy

> That's OS (UNIX) semantic. What exacly SIGBUS, SIGSEGV, and the like
> mean is up to the kernel. You should know you may get a signal under
> OOM. So, before bug-hunting, just check if the system was OOM.

And how do you know the system may have been OOM, if you get a report
from someone else who may not have realised this? Bear in mind that
an OOM situation may not have been logged in the syslog, or left any
sign other than "randomly" crashing processes.

>> to discover that I've been looking at a memory shortage masquerading
>> as a program bug. (I do test the return of malloc.) If the signal

> Why do you call receiving a signal a "program bug"?

I don't. But surely you're aware that the typical reason for a
process being sent SIGBUS or SIGSEGV is a program bug. This as
opposed to SIGHUP, SIGTERM, and most of the other signals.

In a similar vein, if a process dies due to SIGXCPU or SIGXFSZ that
you know it dies because it ran into a cpu or file limit. Yes, I
would like to have some OS support from an overcommitting OS, so that
if it decides to kill a process of mine because it ran out memory I
get SIGXMEM instead of SIGBUS.

>> Properly coping with memory that may not be there when it is used
>> requires special programming techniques. I'm not unwilling to use
>> such techniques, when appropriate, but not in all cases.

> No problem. Is is problem that your program gets stoppen when it gets
> SIGTSTP? If no, you don't have to use any "special programming techniques".
> If yes, you may think of trapping the signal and inform the user that
> stopping your application is not possible. Are you concerned of your
> program being killed at OOM time? Use special techniques to avoid
> triggering the OOM killer (i.e., do not page-fault). It's harder
> than just using signal(), i admit it. It may also be impossible to do
> in a really "safe" way (your stack may grow...). So we need a better
> kernel API to handle such cases. Again, not a C standard problem.

One thing that you should bear in mind is what would happen if
"everyone" (i.e., most of the processes of your normal workload) use
these techniques. For example, if they all use mlock, you find that
(1) they all need to run as (setuid) root, and (2) you might as well
run without swap on a non-overcommitting system. That is not an
optimal outcome.

>>> You're missing that an overcommitting MM system is more advanced
>>> than a nonovercommitting one. The kernel "keeps track of memory"
>>> the right way.

>> An overcommitting MM system _can be_ more advanced. It can also
>> overcommit because it cannot figure out how not to do so. As far
>> as I can tell linux currently falls in the latter category: it
>> cannot tell to what extent memory has been overcommitted, nor does
>> it allow for selective overcommitment.

> IFAIK, older linux versions had no overcommitting. The kernel *evolved*
> from a non-overcommitting mm to a overcommitting one.

Which doesn't mean the current implementation cannot be improved upon.
There is sufficient room for improvement that it is certainly arguable
that right now the kernel doesn't keep track of memory the right way.

-- 
Olaf Weber

Do not meddle in the affairs of sysadmins, for they are quick to anger and have no need for subtlety.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:13 EST