Re: Some questions about linux kernel.

From: James Sutherland (jas88@cam.ac.uk)
Date: Mon Mar 20 2000 - 07:12:16 EST


On Sun, 19 Mar 2000 15:50:07 -0600, you wrote:

>On Sun, 19 Mar 2000, James Sutherland wrote:
>>On Sun, 19 Mar 2000 13:00:03 -0600, you wrote:
>>>On Sat, 18 Mar 2000, Horst von Brand wrote:
>>>>Jesse Pollard <pollard@cats-chateau.net> said:
>>>>
>>>>[...]
>>>>
>>>>> It makes it better. The specific process causing the problem is now
>>>>> identified.
>>>>
>>>>The only thing you identify this way is the first unsuspecting victim which
>>>>happens to step into your trap. The process itself may very well be
>>>>innocent, it just finds itself in a position where making a legitimate
>>>>request at the wrong moment gets it shot. It depends on all the other stuff
>>>>the system is doing at the time: Running NS on an idle, 32Mb machine will
>>>>work; if it is also compiling a kernel and rebuilding XFree86 ATM,
>>>>something will have to give. Who is at fault? NS isn't, it is doing its
>>>>job. Neither are the make, sh, or gcc and whatnot processes doing anything
>>>>illegal. Note that all this is completely unrelated to user quotas and
>>>>such: User quotas just shift the problem to the individual users, some of
>>>>whose innocent processe get killed because of the other things they are
>>>>doing. It can be argued that the user is being held responsible this way,
>>>>and this is in a sense true. But the process in itself is doing nothing
>>>>wrong.
>>>
>>>Except that the user IS exceeding what was made available via quotas.
>>
>>Yes - so what? All you are doing is lowering this limit. Someone uses
>>more memory than they should, so one or more of their processes dies.
>>Nothing whatsoever to do with overcommit.
>>
>>>>This is one of the reasons why OOM handling is hard: Very often there _is_
>>>>no single culprit that can be blamed and clearly deserves to be shot, there
>>>>is just a bunch of innocent people some of whom you have to kill or else
>>>>all are dead.
>>>
>>>This is why resource allocation and quotas are important in multiuser systems.
>>
>>So it would be really nice if we HAD those quotas...
>>
>>>>You also try to protect the system against malicious (or runaway)
>>>>processes. To protect against a runaway memory leak (or dumb mallocbomb)
>>>>is rather easy, essentially you shoot the largest/fastest growing process.
>>>>Protection against malicious processes is much, much harder, as they _will_
>>>>take advantage of blind spots in the detection scheme.
>>>
>>>Unfortunately, that may not be the right one. The "mallocbomb" may not be
>>>growing very fast - but the web server might be as compaired to the bomb.
>>>Or it might be telnetd/login starting a new session. Or cron starting
>>>a new job. Or inetd starting a new ftpd. Or init trying to restart a
>>>getty on the console which just logged out...
>>
>>Occasional friendly fire is inevitable in emergency situations. I'd
>>still rather have one specific process killed (and logged) than have
>>random process deaths.
>>
>>>>Then there is the fact that this is (or should be, at least for a
>>>>well-speced machine) a very rare event, so very little (if any) effort
>>>>during normal operation is warranted. This includes the OS and machine, and
>>>>even much more importantly the sysadmin.
>>>
>>>It cannot be spec'ed without having something to measure against. This is
>>>the other place resource quotas come in.
>>>
>>>This can be compaired to disk quotas. Nobody will really fault the use
>>>of disk quotas.
>>>1. How many file systems are overcommited?
>>Totally different type of "overcommit".
>
>Not completely - it is a difference of degree. At this point it is
>users instead of processes. Users request a certain amount of disk space.
>The quota controls prevent the user from exceeding it.

Quotas have nothing to do with it.

>>>2. Whose decision was that?
>>The FS designers, in this sense.
>
>NOPE - the mangement that directed it to be overcommitted, if quota is
>supported.

Still no connection.

>>>3. Who detects the failure?
>>Whoever gets their FS hosed by optimistic code.
>
>Almost - whoever gets their data lost because there is no space. If quotas
>are available then it is the user. If quotas are not available then the
>OS gets blamed.

Again, you confuse overcommit (demand allocation) with quota
management.

>>>4. Who gets the blame when it occurs?
>>The programmer.
>
>NOPE - the administration, if it is overcomitted, The OS if quotas are
>not available.

See above.

>>>If filesystems are not overcommited, the the total of all legal users
>>>will equal the size of available data space (meta-data included as
>>>"data" here). An overcommited file system is one where the total of
>>>all users quota exceeds the size of the filesystem.
>>Totally different use of "overcommit".
>>
>>>Filesystem failure doesn't occur until enough users use up the space.
>>>No single user may have even reached their limit. But random write
>>>failures occur, databases may become corrupt, and a lot of blame
>>>is going to be assigned.
>>The type of overcommit we are talking about (i.e. VM), the closest
>>analogy is a sparse file. A process requests a 100Mb file, and gets a
>>100Mb sparse file.
>>
>>>Ansers with quota controls:
>>> #2: Overcommiting is a management choice.
>>> #3: the first user that fills the filesystem without using up their quota
>>> #4: Management.
>>>
>>>Without quotas being available:
>>> #2: the analyst that selected the system
>>> #3: the first user that fills the filesystem
>>> #4: the operating system
>>>
>>>Do I use quotas? YES.
>>>Do I have overcommit? YES - management said to do so.
>>>Does the system or analyst get blamed for failure? NO.
>>>
>>>I need the same controls on Linux servers that I have on other servers.
>>>
>>>I can be a lot more lax on restrictions where the system is dedicated
>>>to a single user than I can where it is a multiuser server.
>>
>>You are misunderstanding what we mean by "overcommit". I am talking
>>about allocating a process memory when it USES it, rather than when it
>>REQUESTS it. This has nothing to do with how (or indeed if) I set any
>>quota facilities I may (or on Linux, may NOT) have available.
>
>NOPE - if the process requests it and is granted it, then it should have
>access to it. It is not up to the system to say "here it is, but don't use
>that part of it, because I really didn't give it".

malloc() just allocates address SPACE. TOUCHING the memory then
populates that space, which is how I can use malloc() for a nice,
simple sparse matrix implementation. (On Win32, you have to mess
around trapping exceptions in userland and allocating the memory
yourself...)

>We are talking about the sum of all concurrent requests, and the system
>aborting when part of the requests already granted turns out to not be granted.

No. The malloc() call requested address space, and that succeeded -
the address space was available.

What is NOT available, however, is memory to populate that area fully.
So what? I could well be using malloc() to allocate a SPARSE matrix,
in which case I certainly do NOT want it to populate the whole thing!

I would be seriously irritated if your disk quota system denied me
permission to create a sparse file on the basis that, if fully
populated, it would overrun my quota. Why apply the same system to
memory?

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:29 EST