Re: Avoiding OOM on overcommit...?

From: Linda Walsh (law@sgi.com)
Date: Mon Mar 27 2000 - 12:45:24 EST


Richard Gooch wrote:
>
> Linda Walsh writes:
> > Richard Gooch wrote:
> > > Each of these options is flawed:
> > >
> > > 1&2) you have to do this for all processes
>
> [bugger, you've removed the context]

---
	Sorry about that...:-)
> 
> >       You have to assign process ID's, memory mappings, etc to every
> > process.  So what's your point?  Having a default guaranteed stack
> 
> No, *I* don't have to allocate PIDs. The kernel does that for me.
---
	The kernel would reserve physical memory for stack as it does
now.  Just that right now, that is set to the actual page used, so, 1 page?
I'm saying that could be a tuneable upward.  So you don't *have* to reserve
more space.  You just have the choice to.

> To avoid overcommit, I have to limit the stacksize for all > processes. --- Again, it's not a limit. It's a *minimum* guaranteed to physically be there (i.e. -- not overcommitted).

> I either set a global value which is big enough for my > monster simulation, or I go and hand-tune a pile of exceptions. And > from time to time get a segfault because I set the limit too low. --- You would only segfault if you physically ran out of memory, like now. If there is enough memory, the kernel does an 'on-demand' allocation. In non-OOM cases, it succeeds, and everything is normal...

> Right now I can safely set the global limit to 128 MiB, and not expect > any process to exceed that. If we abolish overcommitting, I've got to > reserve an awful lot of swap space now. 128 MiB * nproc mounts up > *very* quickly. --- No no no...you are misunderstanding. You can still have the exact same behavior today by allowing whatever the default stack size is today to be physically allocated (1 page?), and setting your virtual swap to infinite. You wouldn't have to change a thing. That's the beauty of this -- it's about *flexibility*. You get your behavior, I get mine, and anyone else gets any in-between behaviors they desire.

> That's why I said "handle". If I meant "killed by" I would have said > "killed by". For a stack exhaustion signal to be useful to my > application as a memory management tool (like you argue NULL return > from malloc(3) can be), I have to *handle* that signal and *do > something constructive*. --- This was a question I asked earlier and I didn't get a clear answer: is current context of a process saved on kernel or user stack before a user signal handler is called. Regardless, you are right if it is user stack and the behavior is no worse than today (unless you regard sleeping > > If the default (i.e. the only possibility) is to kill the process, > then we may as well admit defeat and just use the OOM killer. At least > the OOM killer is less likely to kill us (or anything), and even if it > does, it will do it later (hopefully after processing has finished). --- Again -- misunderstanding -- if you configure vswap to <large-value>, you won't run out of memory until you *physically* run out of memory (and swap). So the OOM killer would be running immediately. The only benefit to you is that it might kill some other poor slobs process, or some system process that isn't running with UID==0/CAP_SYSADMIN. CAP_SYSADMIN isn't a CAP that processes like httpd, syslogd, auditd should be running with, so they could be better targets than you if you were running as root. That's not good. But again, I stress, if that's the behavior you want -- it would be a kernel compile time option. > > > > 4) deadlock. > > > > Deadlock is a risk on any system. There are no guarantees > > that resource exhaustion won't lead to deadlock -- in fact, I'd call > > this a predictable failure. Deadlock is not guaranteed, however -- > > processes *may* 'exit' or release memory. The behavior of the > > machine on lack of resources is a deterministic failure. Killing > > random processes is a non-deterministic failure. > > Deadlock is not a risk on any system. It happens due to bugs in the > OS. It's no use to me if the application hangs (or worse, the system > hangs) rather than gets killed. It's a useless corpse in cyberspace > either way. You may as well argue the difference between process > "crash" or "emergency exit". Either way, it's dead, and it didn't get > a chance to save state. --- Very true. In the case of mandatory auditing, I don't *want* to save any state that hasn't been recorded if auditing cannot continue. It breaks the paradigm of mandatory auditing. It's only deadlock if no other running process releases memory. Since all processes would begin to encounter OOM on allocs and forks, some might be smart enough to shut down, and possibly signal system shutdown or a "BIG ALARM BELL" if they were a critical component. > > > And with regards to algorithms to select processes to kill. > > From the perspective of any user or running process it *is* random > > -- since no user or process is going to exactly know the cpu time, > > run time, memory usage and memory priority of every other process > > is. It is, more precisely, pseudo-random. I maintain that random > > results of an OOM condition are worse than predictable results even > > if the predictable result group includes deadlock. > > As a user, it's just as random to me if my application dies/hangs > because I ran out of stack space. I'm running a big job and it did > because it was greedy. The OOM killer will do the same thing, it just > reduces the chance of death. --- How? If you have physically run out of memory -- something dies. Should it be a pseudo-random process or the one that actually ran out of memory?

> At least be honest about the solutions you're proposing and say that > they move the failure mode over a few steps, but don't fundamentally > change the nature of the problem: we can die when we run out of > memory. --- But who dies in such a situation be more deterministic. A well thought out system could have it such that it *won't* be a critical process (they've all asked for 8M stack guarantees). That's what I want. I have some assurance that an OOM situation won't destroy *parts* of the Trusted/Integrity Assured 'core(Base) OS'. The system will *stop* (hang/deadlock/ shutdown) before it would allow the Trusted Base to be compromised. This is *way cool* in some computing environments. And again...I stress, it's configurable.

> If you can come up with a programming model such that stack exhaustion > results in error codes that can be handled by the application, rather > than killing/hanging, then you'll have made an important first step, > and maybe then the word "deterministic" can be used with meaning. But > then, we're not talking POSIX anymore. --- But I don't need error codes for my deterministic system. Error codes are 1 way of providing deterministic behavior. Ultimately if error codes can't successfully be delivered, then if I know the system will be 'stopped', it is deterministic (still configurable behavior).

> It should be clear. --- It's clear we have *slightly* different ideas about deterministic behavior and the value of such, but that doesn't mean we can't come to support an option that provides for both of us.

-linda

-- Linda A Walsh | Trust Technology, Core Linux, SGI law@sgi.com | Voice: (650) 933-5338

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:20 EST