Re: Overcomittable memory

From: John Ripley (john@empeg.com)
Date: Tue Mar 21 2000 - 06:38:20 EST


James Sutherland wrote:
> >Blindingly simple example app:
> >
> >1) syscall - sys_overcommit(off) == this process will not overcommit
>
> No, that's NOT what that syscall does (despite the name...)
>
> In fact, setting it >0 simply disables ALL sanity checking on malloc.
> You want 2Gb on a 4Mb 386, malloc() succeeds. A fairly specialist
> setting, really.

You seem to be misunderstanding what I mean. sys_overcommit(off) here is
an imaginary syscall which disables overcommit completely on a
per-process basis.

> >2) Allocate 512MB of memory with brk(). May fail if this can't be 100%
> >guaranteed. No problem, the user just ran it all of a second ago.
>
> This does NOT depend on (and indeed is not affected by) overcommit,
> provided you touch the memory.

See above.

> >3) mmap() a result file, say 1GB size. Map flags are MAP_SHARED,
> >PROT_WRITE. May fail if the kernel cannot allocate structures. Ditto, no
> >problem.
> >
> >4) Main loop, never allocate any memory during runtime. Use the 512MB as
> >a heap. Or better yet, don't do any dynamic allocation. In fact, don't
> >do any syscalls at all. Run here for a few weeks...
>
> Just run for a few weeks without logging or journalling anything. Hrm.
> Bright design. Supposing there is a power failure or system crash
> during this time?

Don't be so pedantic. It can write to the result file at regular
intervals using msync(). It's only a bloody example.

> >5) Write results into mapped result file. msync()
> >
> >Would you not agree that once stage 4 is reached the program CANNOT
> >POSSIBLY DIE for any reason whatsoever?
>
> Nope. I just poured coffee over the CPU. Only Nortel and Tandem
> hardware handles that gracefully :-)

See above.

> > That is, so long as the 512MB
> >requested by brk() are reserved. No copy, fill or faulting required.
> >Overcommit apps can happily allocate loads of memory on top of this, but
> >they'll just die when memory is depleted to what was reserved.
>
> This is true with or without overcommit - they will just die
> earlier/more often without it.

Ok, I'm shocked how you can come to that conclusion from the example I
gave. HOW can that example program die, apart from hardware failure as
you so pedantically point out?

> >It's the ability to request non-overcommit that's essential, and I don't
> >see what you have against it. With the current scheme of things this is
> >impossible.
>
> No it isn't. Just touch the memory on allocation.

Inefficient, and pointless if you just have non-overcommit. Not to
mention it doesn't work. See below.

> >(Yes, msync() can fail, but that's "easily" fixable with reserved pages)
>
> All in all, a very sloppy design. If you look at sensible number
> crunching apps (Distributed.net and SETI@home are common examples)
> they checkpoint themselves frequently. Any serious app should do this;
> if it doesn't, that's just poor programming on your part.

It's only example. If you want to be that pedantic about it, then make
it write results out every now and then. Probably one extra line of
code.

> If you want non-overcommitted memory, just touch it on allocation.
> That's it. No need to mangle the kernel or anything else.

Does not work. Your code pages (read-only, shared mapping) can get
paged, and when they're hit again, you die. You can make sure nothing
gets paged by using mlockall(MCL_CURRENT|MCL_FUTURE) but that has two
problems:

1) The free pages count is incorrect when there are any shared read-only
mappings. This is a bug.

2) When mlockall'd you have the nasty side-effect of having all mappings
faulted in whether you want it or not - even if they're mapped with
PROT_NONE. Not necessarily a bug, but not pleasant either.

Again, why are you so opposed (e.g the most frequent poster on this
topic) to non-overcommit? Seeing as you've made it clear you'd never use
it, why are you trying to stop us using it?

-- 
John Ripley, empeg Ltd.
http://www.empeg.com

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Mar 23 2000 - 21:00:32 EST