Re: wrong madvise(MADV_DONTNEED) semantic

From: Andy Isaacson
Date: Tue Jun 28 2005 - 13:58:14 EST


On Tue, Jun 28, 2005 at 02:28:20PM -0400, Robert Love wrote:
> On Tue, 2005-06-28 at 11:16 -0700, Andy Isaacson wrote:
> > Besides, if you read the documentation closely, it does not say what you
> > think it says.
> >
> > MADV_DONTNEED
> > Do not expect access in the near future. (For the time
> > being, the application is finished with the given range,
> > so the kernel can free resources associated with it.)
> > Subsequent accesses of pages in this range will succeed,
> > but will result either in reloading of the memory contents
> > from the underlying mapped file (see mmap) or
> > zero-fill-on-demand pages for mappings without an
> > underlying file.
> >
> > You seem to think that "reloading ... from the underlying mapped file"
> > means that changes are lost, but that's not implied.
>
> This wording _does_ imply that changes are lost

I contest your interpretation of the manpage; while it could be read
the way you suggest, I claim that because Linux mmap is inherently
coherent (as opposed to, for example, AIX 4.1 mmap) then the "underlying
file" already contains the updated contents, and ergo msync is not
required for correct MAP_SHARED semantics on Linux, and the manpage as
it stands is (misleading, but) both accurate to the 2.6.11
implementation and compliant with the POSIX description posted earlier.

> if the file is mapped writable and not mysnc'ed

This is the case that my posted example code exercises, and I did not
see any problems. Is there some additional circumstance that is
necessary to cause it to break? (I tested on 2.6.11-rc5 or something
close to that.)

> or if the memory mapping is anonymous.
>
> In the latter case, the data is dropped and the pages are
> zero-filled on access.

Yes, MAP_ANONYMOUS is a more interesting case. Somebody else will have
to write the testcase for that...

I think the correct docs fix is to simply delete the misleading parts of
madvise.2 so that it reads

MADV_DONTNEED
Do not expect access in the near future. (For the time
being, the application is finished with the given range,
so the kernel can free resources associated with it.)

and remove the erroneous parenthetical in the first paragraph.

... unless, of course, someone can actually demonstrate a case where
madvise results in differing semantics...

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/