Re: AIX disclaim() or Tru64 madvise (MADV_DONTNEED) needed

Christoph Rohland (hans-christoph.rohland@sap-ag.de)
20 Aug 1999 20:24:45 +0200


"Dr. Michael Weller" <eowmob@exp-math.uni-essen.de> writes:

> On 20 Aug 1999, Christoph Rohland wrote:
>
> > Tethys <tethys@it.newsint.co.uk> writes:
> >
> > > Christoph Rohland <hans-christoph.rohland@sap-ag.de> writes:
> [...]
> > > Why is this a problem? Whether the pages ares freed immediately
> > > or merely marked to be freed at a later time should be of no
> > > relevance to the application, only to the OS. Or are you relying
> > > on the zero-fill-on-demand behaviour? What do you do on other
> > > OSes that don't support this (e.g., Solaris)?

I forgot: until now we use this implementation only on AIX.

> > Because the other OS's do not free the pages but it only gives a
> > hint to the memory subsystem that these pages are not needed in
> > the near future so it can be paged out.
>
> As a try of clarification: I don't think he relies semantically on
> the zero-fill-on-demand behaviour. But he wants to tell the system
> that he is going to COMPLETELY replace the pages with other data.
>
> The point IS NOT to tell the system: You might want to page these
> pages out, I'm not going to need them soon. The general idea is that
> linux is efficient enough to know better if and when it will page
> this out or not (Now, knowing there is no intelligence in any
> computer program at all, I don't know if I should believe the
> authors of the mm system on that)
>
> The point is that it makes no sense to swap out these pages at all:
> It will use disk space and I/O time to page them back in. And all
> that only to completely overwrite them.
>
> Usually, the way to go would be to just mmap /dev/zero over
> them. But the problem is that this is shared memory and the 'shared'
> aspect of the pages must be kept although their current content can
> be lost until they are modified next time.
>
> On the other hand, if there is plenty memory at hand, it makes no
> sense to replace them with 0000 either. That would only impose CPU
> overhead when the pages are copied from the zero-page on write.

Yes this is right. The perfect semantics would be: Before pageing out
a page throw it away, if it is paged out free the swap page. But to
free the page unconditionally is also o.K.

> >From this point of efficiency, thinking about it, maybe something like
> madvise(MADV_DONTCARE) or even madvise(MADV_DONTNEED) (But an application
> might rely on other OS's not replacing the contents on MADV_DONTNEED;

That's the point why I do not want to use MADV_DONTNEED. It will break
on other systems. MADV_DONTCARE sounds good for me :-)

> As an application coder, I might try to just use madvise() or use a
> placebo #define madvise(a,b,c) if the libc doesn't have one. If I were
> you, I'd definitely try that. The more I think about this, I think the
> existence of these calls implemented on the other systems falsely makes
> you think there is any real point in calling them.

I have that and we have very commonplace scenarios where the machine
uses GB's of swap and is very slow due to swapout of unneeded pages.

> It is yet to be shown if the occasional (unneeded) copy-on-write from
> zero-page or the occasional (unneeded) page out+in using some very good
> and smart I/O system with raid (like you will have on any machine sensible
> for SAP) is more overhead.
>
> Anyway, due to the 4GB limit, you'll have to reuse the
> madvise(MADV_DONTNEED) interprocess communication buffers over time.

No, we can unmap the shm segment and map another one. This is the
whole trick of the discussed implementation. Instead of all the time
mapped and reuseable memory we unmap/map if we do not have enough
space. But unfortunately you can only free a complete shm segment. So
you get over the 4GB limit.

<snip>

> Just having to add some (actually few) MB of pageing space should not be a
> problem.

actually many GB on big systems :-(

Greetings
Christoph

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/