Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

From: Andy Lutomirski
Date: Wed Nov 04 2015 - 16:44:01 EST

On Wed, Nov 4, 2015 at 12:00 PM, Shaohua Li <shli@xxxxxxxxxx> wrote:
> The new proposal tries to fix the TLB issue. We introduce two madvise verbs:
> MARK_FREE. Userspace notifies kernel the memory range can be discarded. Kernel
> just records the range in current stage. Should memory pressure happen, page
> reclaim can free the memory directly regardless the pte state.
> MARK_NOFREE. Userspace notifies kernel the memory range will be reused soon.
> Kernel deletes the record and prevents page reclaim discards the memory. If the
> memory isn't reclaimed, userspace will access the old memory, otherwise do
> normal page fault handling.
> The point is to let userspace notify kernel if memory can be discarded, instead
> of depending on pte dirty bit used by MADV_FREE. With these, no TLB flush is
> required till page reclaim actually frees the memory (page reclaim need do the
> TLB flush for MADV_FREE too). It still preserves the lazy memory free merit of
> Compared to MADV_FREE, reusing memory with the new proposal isn't transparent,
> eg must call MARK_NOFREE. But it's easy to utilize the new API in jemalloc.

I can't speak to the usefulness of this or to other arches, but on x86
(unless you have nohz_full or similar enabled), a pair of syscalls
should be *much* faster than an IPI or a page fault.

I don't know how expensive it is to write to a clean page or to access
an unaccessed page on x86. I'm sure it's not free (there's memory
bandwidth if nothing else), but it could be very cheap.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at