Re: Endless overcommit memory thread.

From: Linda Walsh (law@sgi.com)
Date: Sat Mar 25 2000 - 00:47:01 EST


David Whysong wrote:
>
> Partly right. But the real reason for overcommit is practical: you can do
> more with overcommitted memory, and there are no additional failure modes.
>
> If we define "failure" as a situation where a user program must be killed
> because we are out of memory, then non-overcommit systems should always
> fail more often than overcommitted ones. So I don't see any downside, AT
> ALL, to memory overcommit.

---
	You have non-deterministic behavior.  

First, the kernel should reserve some amount of memory so it will never run out of memory. Ideally, there should be two limits. One level would require processes have UID==0 (or some CAP - CAP_USE_RESERVE_SPACE) to alloc beyond, a second the kernel reserves for itself. If all processes become blocked on waiting for memory, the kernel starts killing user-level processes with the largest first. Probably another CAP for CAP_DONT_KILL_FOR_MEM to protect system processes executing in user space.

For the fork case, a 2nd system call -- I propose 'sproc' -- only because it's a model used to solve the overcommit problem. From the man page:

C SYNOPSIS #include <sys/types.h> #include <sys/prctl.h> pid_t sproc (void (*entry) (void *), unsigned inh, ...); Type of optional third argument: void *arg; pid_t sprocsp (void (*entry) (void *, size_t), unsigned inh, void *arg, caddr_t sp, size_t len); DESCRIPTION The sproc and sprocsp system calls are a variant of the standard fork(2) call. Like fork, the sproc calls create a new process that is a clone of the calling process. The difference is that after an sproc call, the new child process shares the virtual address space of the parent process (assuming that this sharing option is selected, as described below), rather than simply being a copy of the parent. The parent and the child each have their own program counter value and stack pointer, but all the text and data space is visible to both processes. This provides one of the basic mechanisms upon which parallel programs can be built.

A group of processes created by sproc calls from a common ancestor is referred to as a share group or shared process group. A share group is initially formed when a process first executes an sproc or sprocsp call. All subsequent sproc calls by either the parent or other children in this share group will add another process to the share group. In addition to virtual address space, members of a share group can share other attributes such as file tables, current working directories, effective userids and others described below. ...

It describes stack interactions then:

Calling sproc or sprocsp too often, when the stack size is set very large can easily cause the share group to grow larger than the per-process maximum allowable size {PROCSIZE_MAX} [see intro(2)]. In this case, the call will fail and return ENOMEM. A process with lots of distinct virtual spaces (e.g. lots of files mapped via mmap(2)) can fragment the calling process's address space such that it is impossible to find a suitable place for the new child's stack. This case will also cause sproc or sprocsp to fail. ... More about how to generate specific share behavior...then failure modes:

[ENOMEM] If there is not enough virtual space to allocate a new stack. The default stack size is settable via prctl(2), or setrlimit(2). [EAGAIN] The system-imposed limit on the total number of processes under execution, {NPROC} [see intro(2)], would be exceeded. [EAGAIN] The system-imposed limit on the total number of processes under execution by a single user {CHILD_MAX} [see intro(2)], would be exceeded. [EAGAIN] Amount of system memory required is temporarily unavailable. [EINVAL] sp was null and len was less than 8192. [EPERM] The system call is not permitted from a pthreaded program (see CAVEATS section below).

-------- Then there's a bunch more implementation details and caveats.

The point is that if you want trusted behavior, you want it to be deterministic so the behavior is very predictable. I assert this is also important to commercial institutions.

Telling users: "um, yeah we guarantee auditing (C2 security), but a user can possibly kill off auditing or other random system processes, covering all their tracks because Linux is inherently unreliable and non-deterministic" is REALLY bad.

-- Linda A Walsh | Trust Technology, Core Linux, SGI law@sgi.com | Voice: (650) 933-5338

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:14 EST