Re: Some questions about linux kernel.

From: Jesse Pollard (pollard@cats-chateau.net)
Date: Mon Mar 20 2000 - 22:18:50 EST


On Mon, 20 Mar 2000, Richard B. Johnson wrote:
>On Mon, 20 Mar 2000, Marco Colombo wrote:
>
>> On Sun, 19 Mar 2000, Jesse Pollard wrote:
>>
>> [...]
>> > NOPE - if the process requests it and is granted it, then it should have
>> > access to it. It is not up to the system to say "here it is, but don't use
>> > that part of it, because I really didn't give it".
>>
>> malloc() does grant you what you (the process) asked for: it extends your
>> valid address space. It does not grant you any bit of RAM (page-frames
>> will be allocated when you access them, if available, or the process will be
>> put to sleep) or swap space (with overcommitting, it will be allocated when
>> the pages are paged-out). If you want the kernel to *grant* something,
>> you'll have to ask for it. You want RAM, not just VM, use mlock(). You don't
>> need RAM, but you need safe backing store, just create a file, fill it, and
>> mmap() it. If applications die because they treat malloc() as mlock(), it's
>> programmer's fault, not a kernel issue. malloc() it's just an interface to
>> brk(): it does not "allocate" anything (despite of the name) (kernel Gurus:
>> maybe it allocates PTEs and other kernel resources, ok...).
>> So when you access that part of your address space, you should be prepared
>> in doing I/O instead of memory access (most applications are unaware of it,
>> just because they need not to know... but applications depending on
>> performances, such as benchmarks, WILL notice it), or even expect a failure.
>>
>> > We are talking about the sum of all concurrent requests, and the system
>> > aborting when part of the requests already granted turns out to not be granted.
>>
>
>
>Malloc(), as stated before, just sets a new break address when it
>runs out of heap. It keeps track of the heap, but not very carefully.
>
>Memory on real machines is allocated in pages. Even the kernel doesn't
>know if you have overwritten allocated space until you write to a
>page that wasn't allocated.
>
>In the following code, I allocate so little memory from malloc() that
>it is quite likely completely satisfied by whatever is in the heap.
>
>#include <stdio.h>
>#include <malloc.h>
>main()
>{
> char *p;
> p = malloc(0x10);
> strcpy(p, "01234567890ABCDEF0123456789ABCDEF\n");
> puts(p);
> return 0;
>}
>
>Note that I deliberately overwrite the buffer! The machine does not
>seg-fault because whatever is in the heap is already owned by the
>process. However, subsequent calls to malloc() may fail because I
>just might have corrupted my heap. Of course it's my heap, owned
>by my process, so if I want to corrupt it, rendering malloc()
>unusable, it's my business. The kernel doesn't even know nor care.
>
>So as you can see malloc() doesn't really allocate anything. It
>just keeps track of whatever is in the heap and asks the kernel
>for new pages, by setting the break address, whenever a local
>allocation would fail.
>
>Setting a new break address, just adds a new page to the process
>page table. The new page is marked 'not present'. Nothing is allocated.
>This is a performance enhancement. If I attempt to access a page that
>is not in the process page-table, the page-fault handler will send a
>fatal signal to the process (seg-fault). If I attempt to access a page
>that exists in the process page-table, but is not in memory
>(the default case), the kernel will fault in a new page. Then the
>kernel marks the page 'present' and subsequent accesses will not
>cause a page-fault.

No problem there - as long as the total allocated is accounted for
and can be used.

>This happens on a page-by-page basis, conserving real pages when
>possible. Page-faults are caused by hardware and they are very
>fast. If a free page is present in memory, adding that page to
>the page-table (by software) is also very fast. However, attempting
>to free pages is very slow because they have to be stolen and
>their contents written to disk storage.

So is incrementing/decrementing a counter.

>A problem occurs when there are no longer any free pages to steal.
>Since a read/write attempt was made to a page that will never be
>present, the kernel can't just return control to the faulting task.
>If it did so, the faulting task would think that whatever it read
>or whatever it wrote was, in fact, secure in RAM. So, once you
>are out of virtual RAM, you are in a heap of trouble, pun intended.

TOO LATE. unless of course the system is expected to fail.

>Suppose there is a way of solving this problem. It could be transparent
>to any applications they would just have to be recompiled. Suppose
>malloc() was changed so it contained a signal handler.
>
>When malloc() attempts to set a new break address, it sets up a
>handler. Then it calls the kernel to set a new break address.
>Malloc(), before accepting this address, could write a word of
>zeros to the top allocation. This could cause a page-fault. If
>the page-fault handler could not fault in a new page, it could
>send a signal to the process (received my malloc()). Malloc
>can then return NULL for the current allocation request. In this
>manner, the caller of malloc() would always be assured that memory
>was available.
>
>Unfortunately, this is naive. The first time the break address was
>extended, this would work. However, what happens after the kernel
>steals pages from your task to satisfy other requests? Eventually
>pages that you thought you owned, have to be faulted in. There may
>be no more pages to steal so you, thinking you have safely allocated
>real pages, are now deadlocked --and dead.

It can't be done in user space. The kernel must decrement the page
quota value every time a page is allocated to the user. When the
page quota value reaches zero the user process is given a failure.
If per-user quotas (rather than per process) are used, then it
may be a random process in the process group (not likely to be
the parent shell since that usually waits for the command to
complete before continuing). The users process starts may be terminated
with an ENOMEM signal, or "can't fork (or exec) - out of memory quota" error
may occur.

>The only solution to an out-of-memory condition is to never run
>out of memory. The place where all of the system information is
>known is in "user space". The kernel readily "knows" stuff about the
>current process, but retrieving information about other tasks in
>a page-fault handler would result in an extremely poor performing
>machine. A user-space daemon can acquire information about all the
>tasks, can detect runaway tasks, can safeguard special tasks like
>Web Servers that haven't gone crazy, and can watch for performance
>hurting rogue programs.
>
>Such a program, if properly designed, is the solution to such
>out-of-memory conditions.

It can't be done in user space. There is not enough time for all of the
necessary context switches to accomplish the job (at least not all of
it).

NOW - I'm not saying NONE of it can be done in user space. This is where
policy decisions are made. If the userspace daemon is referenced to
determine how much quota is to be given the user, and the daemon keeps
track of the number of users/amount of remaining reserve, then the
kernel can carry out the enforcement of whatever policy is being set
by the daemon.

Doing this at login time is quite reasonable. Doing this on every fork
might be reasonable. Doing this on every page fault is not (and I don't
think you were meaning that anyway).

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@cats-chateau.net

Any opinions expressed are solely my own.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:31 EST