Re: Overcommitable memory??

From: James Sutherland (jas88@cam.ac.uk)
Date: Mon Mar 20 2000 - 15:45:32 EST


On Mon, 20 Mar 2000 09:21:51 -0600 (CST), you wrote:
>James Sutherland <jas88@cam.ac.uk>:
>> On Sun, 19 Mar 2000 16:01:17 -0600, you wrote:
>> >On Sun, 19 Mar 2000, James Sutherland wrote:
>> >>On Sat, 18 Mar 2000 08:31:58 -0600, you wrote:
>> >>>On Fri, 17 Mar 2000, Rik van Riel wrote:
>> >>>>On Fri, 17 Mar 2000, Stephen C. Tweedie wrote:
>> >>>>> On Tue, 14 Mar 2000 14:32:49 -0300 (BRST), Rik van Riel
>> >>>>> <riel@conectiva.com.br> said:
>> (snip)
>> >>It was never a system problem to begin with. The question is, which
>> >>process does the system kill? It has to kill one process - killing the
>> >>one which requested the memory is a case of "shooting the messenger".
>> >
>> ><sarcasm>I know a solution --- lets just always kill init. That way we will
>> >definitly have the memory to run....</sarcasm>
>>
>> You have to kill something, let's try looking for something
>> expendable. Something very big and newly started is the most suitable.
>
>THAT DEPENDS ON POLICY - as defined by management. That "very big and newly
>started" may NOT be the most suitable. If resource quotas are in place, then
>the process would not have been started at all. If quotas are adjusted
>to make resources available then again there is no need to abort any process.

It's too late to "adjust quotas" - there are no more resources to hand
out at this point. Something dies - which process is it?

>> >>> Users (and processes) will get an Out-Of-Resource
>> >>>errors which the users can deal with.
>> >>
>> >>In practice, they usually just die.
>> >
>> >The user still gets the choice.
>>
>> What "choice" - which error message they want?
>>
>> >>> The system administrators and
>> >>>managemnt can analyze the resource accounting and determin where to
>> >>>assign/reassign allocations of a finite resource.
>> >>
>> >>Most systems don't have/suffer from that degree of micromanagement.
>> >
>> >It is not micromanagement - it is a policy decision that someone has
>> >to make. This decision affects every user, and potentially, the viability
>> >of the company.
>>
>> Setting per-user rlimits low enough to avoid users causing each other
>> problems is reasonable enough. The approach you suggested of nailing
>> the users to the floor just in case they happen to gang up on another
>> user is taking things too far in most cases.
>>
>> Either way, it has nothing to do with this.
>
>It prevents the OOM on the system. It is shifted to the responsible party.
>It is under the control of management.

No. Eventually, process start dying from lack of memory. Whether this
is a system wide lack of memory, or simply limited to that user, is
largely immaterial.

>Right now none of this is possible.

Yes, I know you want per-user rlimits. You aren't going to get them in
Linux any time soon.

>> >>> In all cases, the
>> >>>system should not crash.
>> >>
>> >>I never suggested it should - if it ever crashes, this is a bug. This
>> >>is NOT what we are discussing here. The system doesn't crash - it just
>> >>kills one or more processes. The issue is, WHICH process?
>> >
>> >Right now, and in a LOT of cases - it kills the "wrong" one, making the
>> >system unusable.
>>
>> Such as? Properly configured, almost any process could be respawned
>> automatically. The problem is, whichever process gets an "out of
>> resources" message dies - so which process do you deliver it to?
>
>That depends on management decision, as controled by resource allocation.

You want to use "management decision" to rewrite the kernel. Fine, go
and use your management designed kernel.

>> >>> System processes should not abort unless they
>> >>>are exceeding the resources allocated to them. User processes should
>> >>>not be able to cause random process aborts.
>> >>
>> >>Which is the aim of Rik's patch, largely. Until we actually have
>> >>per-user resource quotas, though, there is not a lot you can do about
>> >>it.
>> >
>> >Correct - We Need Resource Quotas - Although Rik's patch makes things
>> >look better sometimes, not always.
>>
>> It is always better than the almost-random deaths at present.
>
>True - the thread is over getting something better.

We already have something better than the old behaviour. The behaviour
you want isn't going to happen any time soon, and it isn't practical
for most people anyway.

>> >>>This also allows self recovery from some forms of DoS attacks. If
>> >>>telnetd recieves 1000 connections then it could exceed resource
>> >>>limits. inetd may die (or sendmail or...). With resource limits
>> >>>in place, at least inetd would be able to handle that - if say
>> >>>no more than 50 concurrent interactive logins were allowed, the
>> >>>other connections would be denied.
>> >>
>> >>No - inetd just gets an OOM error, and dies. Same as before.
>> >
>> >No it doesn't - It refuses to spawn more of the daemons (at least it
>> >should - the quota has been exceeded).
>>
>> It might as well exit until resources are available again, though; it
>> can't do anything useful without resources anyway, so why hog the ones
>> it has?
>
>INETD exit?

Yes, it should. Then be respawned, if that is what management decreed.
More efficient than leaving corpses littering the process table doing
nothing other than taking up space.

>> >>> The same goes for Apache which
>> >>>DOES do this.
>> >>
>> >>No. If a child dies, it spawns a replacement (just like inetd, init
>> >>etc. do). If the parent dies (as it would, for OOM situations) the WWW
>> >>site is down.
>> >
>> >If the parent is killed, yes. But the resource quotas have presumably
>> >been computed to support the proper number of slaves. Thus the parent
>> >doesn't die, but the child process (doing a runaway database query for instance)
>> >would.
>>
>> You have no control over which process gets refused resources first -
>> eventually, it will be the parent.
>
>That is why resource quotas are needed. BTW, the parent process is the
>one most unlikely to be aborted - all of its memory has already been allocated
>and used. New connections reuse exsting buffers, and pass the connection to
>a child. It would be unlikely that the apache parent has page faults to any
>new memory allocation. It should be counted as a BUG if the parent process
>is aborted on COW.

No; the parent process is not just a static loop of spawning children,
it does other things as well. It may be called upon to reread the
configuration file, to take a simple example.

>> > At this point it becomes the job of "per process" quotas rather
>> >than "per user". This particular example MIGHT be handled by currently
>> >existing limits if it weren't for the fact that the SYSTEM cannot control
>> >resource consumption.
>>
>> Again, we need per-user rlimits. I know that - so what?
>
>That is what we are talking about to be able to control the system level OOM.

Yes, they would help - but preventing OOM situations being possible on
any real-world multi-user system would require an incredibly
restrictive regime; it would cause far more problems for the users
than any occasional OOM condition.

Nailing all the users to the ground would indeed make running the
system easier - at least until you are replaced by a more
user-friendly sysop.

>> >>> If the proper resource allocation were given then
>> >>>usrs running wild memory allocators would not be able to crash
>> >>>the system by causing init to die.
>> >>
>> >>They shouldn't be able to anyway - Rik's patch should ensure that init
>> >>will always survive, even if every other process on the box is hosed.
>> >>(Of course, if init was the malfunctioning process to begin with...)
>> >>
>> >>>All else is a matter of implementation.
>> >>
>> >>The core problem remains. You, the user, have a finite amount of
>> >>memory available to you, however that limit is defined. Once you run
>> >>out, your processes will start dying.
>> >
>> >Yes - ME THE USER. I should not be able to cause the SYSTEM a problem.
>>
>> You aren't able to cause the system a problem.
>
>I can now. I've seen it on every system from 2.x onward. I didn't complain
>before because it hadn't raised its' head high enough to cause major failures.
>Now it does.

If you can cause the kernel a problem from userland, that's a bug in
the kernel. Report it, get it fixed.

>> >I should not be able to cause OTHER users a problem.
>>
>> You can't. If you start up a memory hog which would otherwise cause
>> problems, it promptly gets killed.
>
>NOPE. If the hog grows slowly, and has relatively infrequent execution it
>becomes paged out. The processes that get aborted seem to be the ones in memory
>and trying to expand rapidly right then. Besides, the hog may be the properly
>running statistical analysis program that should not be aborted. It is not
>possible to tell via huristic algorithms. They can work sometimes, but not
>always. It depends on the intended purpose of each system.

Yes. If you have a better algorithm, zip it up and submit it.

>> >>We need per-user resource limits, ideally - until then, everything is
>> >>done with process granularity instead. This is a shortcoming we all
>> >>know about already, but not one that is likely to be fixed any time
>> >>soon.
>> >
>> >That is possibly why the OOM complaints will not go away.
>>
>> I haven't heard any complaints about this. A couple of requests for
>> improvements (per-user rlimits) and a load of complaints about the
>> current malloc() implementation allowing sparse matrixes, but not much
>> else.
>
>Every time the OOM complaint comes up. And the thread comes up once or twice
>with every kernel release. This time it seems to have been more completly
>explored than any of the previous occurances.

Yes; the current handling is about as good as it can get without
per-user resource management, which is a long way off.

>Things that still haven't been talked about (though mentioned somewhere):
>
>a. having instrumented the kernel to determine how much oversubscription
> is needed.
Oversubscription?

>b. Determining what the overhead is for resource quotas
Probably significant, but not unacceptable; it is implemented on other
systems without complaint.

>c. Determining how often the OOM condition occurs.
Frequently enough to be worth handling properly, anyway.

>d. Determining if resource quotas can really help (my opinion) and how much.
They will tend to restrict the damage from rogue/malicious
applications to that user, aside from a reduction in system
performance.

>e. Developing performance curves based on the amount of overcommiting that can
> be done per OOM occurance and the severity of the OOM.
Enabling overcommit will free up a slice of memory which applications
have claimed but not yet used, reducing the frequency of OOM
conditions. The scale of the effect depends entirely on the
applications in use.

>I'm already working in one area where OOM is considered catastrophic -
>High security and high availablity systems must NOT have the system go
>OOM. That is what per user resource quotas help to prevent.
HA systems, yes - but high security?

>a. high security systems do not accept the audit logger (not necessarily
> syslog) since having the logger killed could hide a security attack.
Yes, you should exempt the loggers (along with init etc.) from
defensive killing.

>b. High availability servers do not like having the hartbeat monitor die -
> That would signal the other system that the first is down when it really
> isn't. This can lead to data corruption and failure of both systems.
In this case, the correct handling may well be to down the system for
real, letting the backup take over.

>Attempting to auto recover from these failures only makes the system more
>crash prone.
No. We MUST handle the event in some way - the only question is, how
do we handle it - which process(es) do we kill?

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:30 EST