Re: Some questions about linux kernel.

From: Richard B. Johnson (root@chaos.analogic.com)
Date: Mon Mar 20 2000 - 17:19:57 EST

Next message: Aaron Tiensivu: "Re: ALPHA: 2.2.15-pre15: math emulation problem?"
Previous message: Martin Josefsson: "Possible TCP socket bug in kernel >= 2.2.15pre2 and recent 2.3"
In reply to: Marco Colombo: "Re: Some questions about linux kernel."
Next in thread: Alan Cox: "Re: Some questions about linux kernel."
Reply: Alan Cox: "Re: Some questions about linux kernel."
Reply: David Whysong: "Re: Some questions about linux kernel."
Reply: Jesse Pollard: "Re: Some questions about linux kernel."
Reply: Marco Colombo: "Re: Some questions about linux kernel."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 20 Mar 2000, Marco Colombo wrote:

> On Sun, 19 Mar 2000, Jesse Pollard wrote:
>
> [...]
> > NOPE - if the process requests it and is granted it, then it should have
> > access to it. It is not up to the system to say "here it is, but don't use
> > that part of it, because I really didn't give it".
>
> malloc() does grant you what you (the process) asked for: it extends your
> valid address space. It does not grant you any bit of RAM (page-frames
> will be allocated when you access them, if available, or the process will be
> put to sleep) or swap space (with overcommitting, it will be allocated when
> the pages are paged-out). If you want the kernel to *grant* something,
> you'll have to ask for it. You want RAM, not just VM, use mlock(). You don't
> need RAM, but you need safe backing store, just create a file, fill it, and
> mmap() it. If applications die because they treat malloc() as mlock(), it's
> programmer's fault, not a kernel issue. malloc() it's just an interface to
> brk(): it does not "allocate" anything (despite of the name) (kernel Gurus:
> maybe it allocates PTEs and other kernel resources, ok...).
> So when you access that part of your address space, you should be prepared
> in doing I/O instead of memory access (most applications are unaware of it,
> just because they need not to know... but applications depending on
> performances, such as benchmarks, WILL notice it), or even expect a failure.
>
> > We are talking about the sum of all concurrent requests, and the system
> > aborting when part of the requests already granted turns out to not be granted.
>

Malloc(), as stated before, just sets a new break address when it
runs out of heap. It keeps track of the heap, but not very carefully.

Memory on real machines is allocated in pages. Even the kernel doesn't
know if you have overwritten allocated space until you write to a
page that wasn't allocated.

In the following code, I allocate so little memory from malloc() that
it is quite likely completely satisfied by whatever is in the heap.

#include <stdio.h>
#include <malloc.h>
main()
{
    char *p;
    p = malloc(0x10);
    strcpy(p, "01234567890ABCDEF0123456789ABCDEF\n");
    puts(p);
    return 0;
}

Note that I deliberately overwrite the buffer! The machine does not
seg-fault because whatever is in the heap is already owned by the
process. However, subsequent calls to malloc() may fail because I
just might have corrupted my heap. Of course it's my heap, owned
by my process, so if I want to corrupt it, rendering malloc()
unusable, it's my business. The kernel doesn't even know nor care.

So as you can see malloc() doesn't really allocate anything. It
just keeps track of whatever is in the heap and asks the kernel
for new pages, by setting the break address, whenever a local
allocation would fail.

Setting a new break address, just adds a new page to the process
page table. The new page is marked 'not present'. Nothing is allocated.
This is a performance enhancement. If I attempt to access a page that
is not in the process page-table, the page-fault handler will send a
fatal signal to the process (seg-fault). If I attempt to access a page
that exists in the process page-table, but is not in memory
(the default case), the kernel will fault in a new page. Then the
kernel marks the page 'present' and subsequent accesses will not
cause a page-fault.

This happens on a page-by-page basis, conserving real pages when
possible. Page-faults are caused by hardware and they are very
fast. If a free page is present in memory, adding that page to
the page-table (by software) is also very fast. However, attempting
to free pages is very slow because they have to be stolen and
their contents written to disk storage.

A problem occurs when there are no longer any free pages to steal.
Since a read/write attempt was made to a page that will never be
present, the kernel can't just return control to the faulting task.
If it did so, the faulting task would think that whatever it read
or whatever it wrote was, in fact, secure in RAM. So, once you
are out of virtual RAM, you are in a heap of trouble, pun intended.

Suppose there is a way of solving this problem. It could be transparent
to any applications they would just have to be recompiled. Suppose
malloc() was changed so it contained a signal handler.

When malloc() attempts to set a new break address, it sets up a
handler. Then it calls the kernel to set a new break address.
Malloc(), before accepting this address, could write a word of
zeros to the top allocation. This could cause a page-fault. If
the page-fault handler could not fault in a new page, it could
send a signal to the process (received my malloc()). Malloc
can then return NULL for the current allocation request. In this
manner, the caller of malloc() would always be assured that memory
was available.

Unfortunately, this is naive. The first time the break address was
extended, this would work. However, what happens after the kernel
steals pages from your task to satisfy other requests? Eventually
pages that you thought you owned, have to be faulted in. There may
be no more pages to steal so you, thinking you have safely allocated
real pages, are now deadlocked --and dead.

The only solution to an out-of-memory condition is to never run
out of memory. The place where all of the system information is
known is in "user space". The kernel readily "knows" stuff about the
current process, but retrieving information about other tasks in
a page-fault handler would result in an extremely poor performing
machine. A user-space daemon can acquire information about all the
tasks, can detect runaway tasks, can safeguard special tasks like
Web Servers that haven't gone crazy, and can watch for performance
hurting rogue programs.

Such a program, if properly designed, is the solution to such
out-of-memory conditions.

Cheers,
Dick Johnson

Penguin : Linux version 2.3.41 on an i686 machine (800.63 BogoMips).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Aaron Tiensivu: "Re: ALPHA: 2.2.15-pre15: math emulation problem?"
Previous message: Martin Josefsson: "Possible TCP socket bug in kernel >= 2.2.15pre2 and recent 2.3"
In reply to: Marco Colombo: "Re: Some questions about linux kernel."
Next in thread: Alan Cox: "Re: Some questions about linux kernel."
Reply: Alan Cox: "Re: Some questions about linux kernel."
Reply: David Whysong: "Re: Some questions about linux kernel."
Reply: Jesse Pollard: "Re: Some questions about linux kernel."
Reply: Marco Colombo: "Re: Some questions about linux kernel."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:31 EST