Re: Avoiding OOM on overcommit...?

From: Horst von Brand (vonbrand@sleipnir.valparaiso.cl)
Date: Mon Mar 27 2000 - 19:48:24 EST


Linda Walsh <law@sgi.com> said:
> Horst von Brand wrote:

[...]

> > Not in itself, the problem is that if you don't ever want to overcommit
> > anything you must know exactly how much memory each activity could use, in
> > the very worst case.

> No...you are confusing the concept of OS overcommitment with
> prediction of an applications future requests for memory (which can be
> denied).

Nope, I talking about the _kernel's_ memory usage here.

> The only thing a program has to "predict" is a maximum stack
> size -- which is physically reserved as a *minimum* at run time. All
> other requests for memory can be denied with an error code.

And crash if you run out of stack? That was supposed to be forbidden...

> > I can understand there are people worried by stuff like C2 security, but in
> > that case you can work with overcommitment, just make sure the tasks
> > crucial for C2 can't run out of resources (unless they are broken or the
> > sysadmin is a complete idiot, that is), and then do as you say: If they do
> > run out, take the whole system down.
> ---
> Well -- that's sorta the point -- Everything from 'atd' to 'vi'
> would need to be rewritten to 'touch' pages of alloc'ed memory. If you
> want to promise integrity -- then you can choose to run with no 'virtual
> swap' and guaranteed _minimum_ stack space sizes allocated at run time.
> With the current model, say, auditd could think it malloc'ed a 2Meg
> buffer -- thus it thinks it has it's space guaranteed. If we are in a
> OOM state, when auditd goes to access that buffer, it will SEGV -- can't
> map address to physical object, or a "OOM" killer routine runs and kills
> another process pseudo randomly. What I'm saying is we need to provide a
> model that doesn't overcommit. You, personally, or anyone else doesn't
> have to use that model. But such a model, if in the kernel would allow
> for operational assurance (allowing for failures to occur predictably).

If you want to go that way, let the kernel do the dirty work. That is
probably easier than fixing several thousand programs.

"(allowing for failures to occur predictably)" for whom? Not for me, as the
final user, I just see my programs crash at random or not starting at all.
If they wander into some OOM-killer that is halfways decently done, they
will be killed _less_ (overcommitment will let them go further, perhaps
even go through) and _more_ predictably, i.e., the ones killed will be
probably those that are memory hogs. And as killing them will free more
resources than picking processes at random (or kill anything that does a
sbrk(2) at the wrong time, which is essentially the same), there will be
even less processes killed this way...

> The idea here is to *prevent* overcommitment. OOM can't be
> prevented, but if you have eliminated overcommitment, how OOM is handled
> can be predicted to a certain level. Otherwise, you end up with a
> completely untrusted (non-predictable) state after an OOM event. That's
> fine on some systems, but on others, not. The idea is configurability --
> is that such a bad thing? The *ability* to not overcommit would change
> nothing for you, but for me, it would limit OOM failures to determinant,
> finite class.

How is the state after killing a memory hog "completely unstrusted
(non-predictable)"? If your OS/application is so broken that killing a
process (under whatever circumstances) leads to this, you have much more
than OOM to worry about.

-- 
Horst von Brand                             vonbrand@sleipnir.valparaiso.cl
Casilla 9G, Viņa del Mar, Chile                               +56 32 672616

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:21 EST