wrong madvise(MADV_DONTNEED) semantic

From: Samuel Thibault
Date: Tue Jun 28 2005 - 09:18:42 EST


Hi,

There is something wrong with the current madvise(MADV_DONTNEED)
implementation. Both the manpage and the source code says that
MADV_DONTNEED means that the application does not care about the data,
so it might be thrown away by the kernel. But that's not what posix
says:

http://www.opengroup.org/onlinepubs/009695399/functions/posix_madvise.html

It says that "The posix_madvise() function shall have no effect on the
semantics of access to memory in the specified range". I.e. the data
that was recorded shall be saved!

The current linux implementation of MADV_DONTNEED is rather an
implementation of solaris' MADV_FREE, see its manpage:
http://docs.sun.com/app/docs/doc/816-5168/6mbb3hrde?a=view

Hence the current madvise_dontneed() implementation could be renamed
into madvise_free() and the appropriate MADV_FREE case be added, while
a new implementation of madvise_dontneed() _needs_ be written. It may
for instance go through the range so as to zap clean pages (since it is
safe), and set dirty pages as being least recently used so that they
will be considered as good candidates for eviction.

(And the manpage should get corrected too)

Regards,
Samuel Thibault
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/