Re: Endless overcommit memory thread.

From: Linda Walsh (law@sgi.com)
Date: Sat Mar 25 2000 - 00:47:01 EST

Next message: Rick van Rein: "Re: Patch: BadRAM put to use"
Previous message: Jim Roland: "Nostalgia: System V Release 2 filesystem"
In reply to: David Whysong: "Re: Endless overcommit memory thread."
Next in thread: David Whysong: "Re: Endless overcommit memory thread."
Reply: David Whysong: "Re: Endless overcommit memory thread."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

David Whysong wrote:
>
> Partly right. But the real reason for overcommit is practical: you can do
> more with overcommitted memory, and there are no additional failure modes.
>
> If we define "failure" as a situation where a user program must be killed
> because we are out of memory, then non-overcommit systems should always
> fail more often than overcommitted ones. So I don't see any downside, AT
> ALL, to memory overcommit.

---
	You have non-deterministic behavior.  
	First, the kernel should reserve some amount of memory so it will
never run out of memory.  Ideally, there should be two limits.  One level
would require processes have UID==0 (or some CAP - CAP_USE_RESERVE_SPACE) to
alloc beyond, a second the kernel reserves for itself.  If all processes 
become blocked on waiting for memory, the kernel starts killing user-level
processes with the largest first.  Probably another CAP for CAP_DONT_KILL_FOR_MEM
to protect system processes executing in user space.
	For the fork case, a 2nd system call -- I propose 'sproc' -- only
because it's a model used to solve the overcommit problem.  From the man page:
C SYNOPSIS
     #include <sys/types.h>
     #include <sys/prctl.h>
 
     pid_t sproc (void (*entry) (void *), unsigned inh, ...);
 
     Type of optional third argument:
     void *arg;
 
     pid_t sprocsp (void (*entry) (void *, size_t), unsigned inh,
                    void *arg, caddr_t sp, size_t len);
 
DESCRIPTION
     The sproc and sprocsp system calls are a variant of the standard fork(2)
     call.  Like fork, the sproc calls create a new process that is a clone of
     the calling process.  The difference is that after an sproc call, the new
     child process shares the virtual address space of the parent process
     (assuming that this sharing option is selected, as described below),
     rather than simply being a copy of the parent.  The parent and the child
     each have their own program counter value and stack pointer, but all the
     text and data space is visible to both processes.  This provides one of
     the basic mechanisms upon which parallel programs can be built.            
     A group of processes created by sproc calls from a common ancestor is
     referred to as a share group or shared process group.  A share group is
     initially formed when a process first executes an sproc or sprocsp call.
     All subsequent sproc calls by either the parent or other children in this
     share group will add another process to the share group.  In addition to
     virtual address space, members of a share group can share other
     attributes such as file tables, current working directories, effective
     userids and others described below.
                                                                               
...
 	It describes stack interactions then:
     Calling sproc or sprocsp too often, when the stack size is set very large
     can easily cause the share group to grow larger than the per-process
     maximum allowable size {PROCSIZE_MAX} [see intro(2)].  In this case, the
     call will fail and return ENOMEM.
 
     A process with lots of distinct virtual spaces (e.g. lots of files mapped
     via mmap(2)) can fragment the calling process's address space such that
     it is impossible to find a suitable place for the new child's stack.
     This case will also cause sproc or sprocsp to fail.
...
	More about how to generate specific share behavior...then failure modes:
     [ENOMEM]       If there is not enough virtual space to allocate a new
                    stack.  The default stack size is settable via prctl(2),
                    or setrlimit(2).
 
     [EAGAIN]       The system-imposed limit on the total number of processes
                    under execution, {NPROC} [see intro(2)], would be
                    exceeded.
 
     [EAGAIN]       The system-imposed limit on the total number of processes
                    under execution by a single user {CHILD_MAX} [see
                    intro(2)], would be exceeded.
 
     [EAGAIN]       Amount of system memory required is temporarily
                    unavailable.
 
     [EINVAL]       sp was null and len was less than 8192.
 
     [EPERM]        The system call is not permitted from a pthreaded program
                    (see CAVEATS section below).
--------
 
	Then there's a bunch more implementation details and caveats.
	The point is that if you want trusted behavior, you want it to be
deterministic so the behavior is very predictable.  I assert this is also
important to commercial institutions.
	Telling users: "um, yeah we guarantee auditing (C2 security), but a
user can possibly kill off auditing or other random system processes,
covering all their tracks because Linux is inherently unreliable and 
non-deterministic" is REALLY bad.
-- 
Linda A Walsh                    | Trust Technology, Core Linux, SGI
law@sgi.com                      | Voice: (650) 933-5338
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rick van Rein: "Re: Patch: BadRAM put to use"
Previous message: Jim Roland: "Nostalgia: System V Release 2 filesystem"
In reply to: David Whysong: "Re: Endless overcommit memory thread."
Next in thread: David Whysong: "Re: Endless overcommit memory thread."
Reply: David Whysong: "Re: Endless overcommit memory thread."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 21:00:14 EST