In the performance test shown on the cover, we repeatedly performed
touch and madvise(MADV_DONTNEED) actions, which simulated the case
you said above.
We did find a small amount of performance regression, but I think it is
acceptable, and no new perf hotspots have been added.
That test always accesses 2MiB and does it from a single thread. Things
might (IMHO will) look different when only accessing individual pages
and doing the access from one/multiple separate threads (that's what
No, it includes multi-threading:
Oh sorry, I totally skipped [2].
while (1) {
char *c;
char *start = mmap_area[cpu];
char *end = mmap_area[cpu] + FAULT_LENGTH;
pthread_barrier_wait(&barrier);
//printf("fault into %p-%p\n",start, end);
for (c = start; c < end; c += PAGE_SIZE)
*c = 0;
pthread_barrier_wait(&barrier);
for (i = 0; cpu==0 && i < num; i++)
madvise(mmap_area[i], FAULT_LENGTH, MADV_DONTNEED);
pthread_barrier_wait(&barrier);
}
Thread on cpu0 will use madvise(MADV_DONTNEED) to release the physical
memory of threads on other cpu.
I'll have a more detailed look at the benchmark. On a quick glimpse,
looks like the threads are also accessing a full 2MiB range, one page at
a time, and one thread is zapping the whole 2MiB range. A single CPU
only accesses memory within one 2MiB range IIRC.
Having multiple threads just access individual pages within a single 2
MiB region, and having one thread zap that memory (e.g., simulate
swapout) could be another benchmark.
We have to make sure to run with THP disabled (e.g., using
madvise(MADV_NOHUGEPAGE) on the complete mapping in the benchmark
eventually), because otherwise you might just be populating+zapping THPs
if they would otherwise be allowed in the environment.