Re: Virtual vs. physical swap & shared memory forks (sprocX)

From: David Whysong (dwhysong@physics.ucsb.edu)
Date: Sat Mar 25 2000 - 05:48:08 EST


On Sat, 25 Mar 2000, Linda Walsh wrote:

>David Whysong wrote:
>> Non-deterministic with the current kernels that randomly kill things, yes.
>> I certainly don't like the situation. But "fixing" the problem by adding
>> new system calls isn't a good solution -- you've redefined the problem
>> such that all current software is broken, and needs to be rewritten to use
>> your syscall.
>---
> All? That's a pretty strong statement. Programs that currently
>behave will continue to run. Programs that spawn off hundreds of 40 megabyte
>processes are being careless. They are relying on the non-determistic
>operating system not enforcing memory restrictions.

Which problem are you trying to solve, exactly? There have been several
problems attributed to overcommit in the various related threads:

        1. User tasks can be killed on malloc()
        2. User tasks can crash the system by running it OOM
        3. Memory DoS attacks, one user can tie up most of the memory

(Have I missed anything?)

#1 is true, but irrelevant. It ignores two important facts: first,
   processes can be killed due to stack growth in any case. Second, when
   you start killing tasks on a non-overcommitted system, you would
   probably be able to continue running normally with overcommit.
#2 isn't due to overcommitting, it's a kernel "misfeature"
#3 also has nothing to do with overcommit, and can be solved with quotas

> An additional facility -- at the administrator's option they can
>add "vswap" (also from IRIX) "Virtual Swap space":

[description of virtual swap space, essentially "overcommit on demand"]

> This would require no reprogramming of bad apps, but an admin
>would have to explicitly enable some amount of virtual swap space.

Yes, it's marginally better than sproc(), at least it doesn't break app
portability. It's still a kludge IMO.

>> A better solution is to impose sane, deterministic behavior in the
>> overcommitted case. This can be done with optional memory quotas in
>> conjunction with Rik van Riel's kernel patch. But removing overcommit
>> doesn't solve anything.
>---
> Sure it does. If you run out of memory, then 'malloc' will return
>NULL. (Yeah, I'm changing my story on the fly -- default return
>failure unless vswap is used....then we can have the above)

The "malloc() not returning NULL" argument is complete nonsense. It's not
worth throwing away a useful feature (overcommit) with concrete benefits
over this.

In fact, malloc() not returning NULL isn't a bug, it's a feature. It lets
you do things that you couldn't otherwise accomplish, for no cost.

>> > First, the kernel should reserve some amount of memory so it will
>> >never run out of memory.
>>
>> ...and that's hard to do. AFAIK Linux reserves a fraction of memory for
>> the kernel (256 pages on my machine), but doesn't guarantee anything
>> beyond that.
>---
> So it dynamically reserves 256 more pages than it is currently using,
>so it will be 256 pages away from being out of memory when it realizes there
>is a problem? So kernel running out of memory shouldn't ever happen -- as
>it should always have a 256-page buffer more than it is currently using.

In principle there is no limit to how much memory the kernel might want to
allocate for itself. I suspect that people doing video capture or DRI/GLX
stuff might want more than 256 pages.

> Note, I'm serious about the above CAP's. Again -- if you want to
>protect your 'X', you can make sure it runs with the "don't kill cap".

Ok, I see. I misinterpreted that as a "cap on user space", instead of a
capability.

>> Killing from largest to smallest isn't a good idea. That often makes the X
>> server go first. Have a look at Rik van Riel's OOM killer patch for a
>> better example. I think that the policy of what process to kill should be
>> configurable.
>---
> Can you describe it's behavior? I don't happen to have a copy, but
>if it's a good algorithm, it should be deterministic and well documented.

It's a pretty good heuristic. The algorithm is:

        badness = memory used / sqrt(CPU time) / fourth root(run time)
        then badness is doubled if the task is niced
        if uid or euid are zero, (or capability of sysadmin) badness is
                divided by 4
        if the process has direct hardware access, badness is divided by 2

Then the process with the highest badness is killed.

I think this is a good default, but it would be nice to have some external
control over a task's badness; say an entry in task_struct, with a syscall
to allow a daemon to set it based on some configuration file. This would
allow the functionality of your capability, but on a more fine grained
level. I am working on this (slowly... I want to get it right, and as I
said, I normally do huge numerical simulations, not system level
programming).

>> Again, this isn't very meaningful. Any non-deterministic behavior isn't a
>> result of overcommitment, it's due to the fact that the kernel hasn't been
>> informed of what to do when OOM. That can be fixed without removing memory
>> overcommitment. Just implement quotas, or alternately task priorities and
>> have the kernel kill the lowest priority tasks first. After all, by the
>> time you start killing tasks on an overcommitted system, you would have
>> been killing tasks long before without overcommit...
>---
> No -- you'd return failures on malloc (assuming no virtual swap).
>Not random or senseless killing. Then each app can choose what to do when
>it runs into an out of memory condition instead of expecting that the sysadmin
>will know the correct behavior for every app running on the system.

When you run out of memory, overcommit or not, tasks are going to die.
Removing overcommit might make malloc() return null, but that's only one
of a host of ways to allocate memory. The other methods don't have a
return value. So arguing that "overcommit is bad, because it breaks the
malloc() return value" is pointless.

>> The problem is not overcommit. The problem is that the system doesn't
>> handle OOM well. It would be better to solve the problem than cover it up
>> under some new system call.
>---
> The system would handle it just fine if you returned NULL on mallocs
>or ENOMEMs/EAGAINs on forks. So what would you want 1) when I'm in vi, I
>attempt to spawn a shell it returns "insufficient memory", or 2) the system
>starts deciding by some sysadmin set policy about what to kill first.
>
> The user of a such a system wouldn't know what to expect.

The problem here is that the system is out of memory. This has nothing to
do with overcommitment. The solution is to make the kernel more
intelligent about killing processes when OOM. Anything else just hides the
problem.

Dave

David Whysong dwhysong@physics.ucsb.edu
Astrophysics graduate student University of California, Santa Barbara
My public PGP keys are on my web page - http://www.physics.ucsb.edu/~dwhysong
DSS PGP Key 0x903F5BD6 : FE78 91FE 4508 106F 7C88 1706 B792 6995 903F 5BD6
D-H PGP key 0x5DAB0F91 : BC33 0F36 FCCD E72C 441F 663A 72ED 7FB7 5DAB 0F91

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:15 EST