Re: Avoiding OOM on overcommit...?

From: Marco Colombo (marco@esi.it)
Date: Fri Mar 24 2000 - 07:16:02 EST


On 22 Mar 2000, Olaf Weber wrote:

> Marco Colombo writes:
> > On 21 Mar 2000, Olaf Weber wrote:
>
> >> On ther other hand, most programs do not use sbrk(2) to obtain memory,
> >> they use malloc(3). The difference is that (following the C standard)
> >> there is a guarantee that if malloc succeeds, then the memory can be
> >> used. On a system that unconditionally overcommits, this guarantee
> >> doesn't hold.
>
> > Does the standard say something about paging? (I don't know, but i think no)
> > Paging is transparent, until you *time* what you're doing. Then you'll
> > see what the difference is between RAM and disk access time.
>
> Etc. I think you're just being silly here, and know it.

Yes, partly i am.
But you are used to paging, you may think it is natural that when you
need real RAM, you have to ask for it. To non Unix programmers, this is
weird. Just like having you application "sleep" with a spin loop, which
is a common programming tecnique in non timesharing systems.
It is not bad *C language* programming, it's bad *UNIX* programming.

The OS may always signal you that something happened: the user of your
controlling terminal just exited (hangup), your program has been requested
to terminate, we're OOM, etc. That's OS semantic. It's outside the scope
of the C standard.

You may say the Linux kernel should not kill random process, and i may
agree. Don't say malloc() behaviour is not "standard". The C language
does not say anything about when or why a process should get a signal
by the OS. The C language, BTW, has no notion of what a process is.

> > malloc() "allocates" memory only on its own opaque pool. It has no
> > OS meaning.
>
> Correct. But the C standard says that it has certain properties (and
> implies some others). Basically, if I write a program and get reports

Pardon me, a standard either "says" something, or not. You may imply
something while reading it.

> that it dies with a SIGBUS in some circumstances, I go on a bug-hunt
> for the use of misaligned addresses and such. I would be very unhappy

That's OS (UNIX) semantic. What exacly SIGBUS, SIGSEGV, and the like
mean is up to the kernel. You should know you may get a signal under
OOM. So, before bug-hunting, just check if the system was OOM.

> to discover that I've been looking at a memory shortage masquerading
> as a program bug. (I do test the return of malloc.) If the signal

Why do you call receiving a signal a "program bug"? A program may receive
SIGHUP, SIGINT, SIGTERM, and many others, under normal (usually external)
conditions. OOM is an "external" condition. You may suggest that we need
a new signal, SIGABOUTTOOOM, but this is OS matter.

> sent had been (say) SIGNOMEM instead then I'd at least be spared that
> particular incovenience.

you need better support from the OS, then. Adherence to C standard is
another matter.

> > On systems with no protected memory (yes, you compile C programs on
> > them) you can access every address you like. No need to malloc. You
> > may write you own heap management routine. libc provides one. On
> > system with memory protection, you can still write your own heap
> > management routine, which uses whatever system call to do its jobs.
> > malloc is a "standard" way to manage a heap.
>
> If you want to write even moderately portable code, then you haven't
> got many options besides trusting malloc. More sophisticated methods
> of obtaining memory tend to differ between systems, and even between
> different revisions of a system. Writing your own memory management
> routines will (in portable code) typically be done _on top of_ malloc
> rather than in stead of it.
>
> Properly coping with memory that may not be there when it is used
> requires special programming techniques. I'm not unwilling to use
> such techniques, when appropriate, but not in all cases.

No problem. Is is problem that your program gets stoppen when it gets
SIGTSTP? If no, you don't have to use any "special programming techniques".
If yes, you may think of trapping the signal and inform the user that
stopping your application is not possible. Are you concerned of your
program being killed at OOM time? Use special techniques to avoid
triggering the OOM killer (i.e., do not page-fault). It's harder
than just using signal(), i admit it. It may also be impossible to do
in a really "safe" way (your stack may grow...). So we need a better
kernel API to handle such cases. Again, not a C standard problem.

> [...]
>
> >> In any case, it seems people are now mostly shooting the same
> >> arguments at each others. More constructive (but definitely more
> >> work) would be to instrument a kernel to actually keep track of memory
> >> in sufficient detail, and (for example) log messages when it is
>
> > You're missing that an overcommitting MM system is more advanced than
> > a nonovercommitting one. The kernel "keeps track of memory" the right way.
>
> An overcommitting MM system _can be_ more advanced. It can also
> overcommit because it cannot figure out how not do so. As far as I
> can tell linux currently falls in the latter category: it cannot
> tell to what extent memory has been overcommitted, nor does it allow
> for selective overcommitment.

IFAIK, older linux versions had no overcommitting. The kernel *evolved*
from a non-overcommitting mm to a overcommitting one.

> --
> Olaf Weber
>
> Do not meddle in the affairs of sysadmins,
> for they are quick to anger and have no need for subtlety.
>

.TM.

-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:12 EST