[PATCH 0/4] Convert khugepaged to a task_work function

From: Alex Thorlton
Date: Wed Oct 22 2014 - 22:35:40 EST


Hey everyone,

Last week, while discussing possible fixes for some unexpected/unwanted behavior
from khugepaged (see: https://lkml.org/lkml/2014/10/8/515) several people
mentioned possibly changing changing khugepaged to work as a task_work function
instead of a kernel thread. This will give us finer grained control over the
page collapse scans, eliminate some unnecessary scans since tasks that are
relatively inactive will not be scanned often, and eliminate the unwanted
behavior described in the email thread I mentioned.

This initial patch is fully functional, but there are quite a few areas that
will need to be polished up before it's ready to be considered for a merge. I
wanted to get this initial version out with some basic test results quickly, so
that people can give their opinions and let me know if there's anything they'd
like to see done differently (and there probably is :). I'll give details on
the code in the individual patches.

I gathered some pretty rudimentary test data using a 48-thread NAMD simulation
pinned to a cpuset with 8 cpus and about 60g of memory. I'm checking to see if
I'm allowed to publish the input data so that others can replicate the test. In
the meantime, if somebody knows of a publicly available benchmark that stresses
khugepaged, that would be helpful.

The only data point I gathered was the number of pages collapsed, sampled every
ten seconds, for the lifetime of the job. This one statistic gives a pretty
decent illustration of the difference in behavior between the two kernels, but I
intend to add some other counters to measure fully completed scans, failed
allocations, and possibly scans skipped due to timer constraints.

The data for the standard kernel (with a very small patch to add the stat
counter that I used to the task_struct) is available here:

http://oss.sgi.com/projects/memtests/pgcollapse/output-khpd

This was a fairly recent kernel (last Tuesday). Commit ID:
2d65a9f48fcdf7866aab6457bc707ca233e0c791. I'll send the patches I used for that
kernel as a reply to this message shortly.

The output from the modified kernel is stored here:

http://oss.sgi.com/projects/memtests/pgcollapse/output-pgcollapse

The output is stored in a pretty dumb format (*really* wide). Best viewed in a
simple text editor with word wrap off, just fyi.

Quick summary of what I found: Both kernels performed about the same when it
comes to overall runtime, my kernel was 22 seconds faster with a total runtime
of 4:13:07. Not a significant difference, but important to note that there was
no apparent performance degradation. The most interesting result is that my
kernel completed the majority of the necessary page collapses for this job in
2:04, whereas the mainline kernel took 29:05 to get to the same point.

Let me know what you think. Any suggestions are appreciated!

- Alex

Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Bob Liu <lliubbo@xxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Kees Cook <keescook@xxxxxxxxxxxx>
Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx

Alex Thorlton (4):
Disable khugepaged thread
Add pgcollapse controls to task_struct
Convert khugepaged scan functions to work with task_work
Add /proc files to expose per-mm pgcollapse stats

fs/proc/base.c | 23 +++++++
include/linux/khugepaged.h | 10 ++-
include/linux/sched.h | 16 +++++
kernel/fork.c | 7 ++
kernel/sched/fair.c | 18 +++++
mm/huge_memory.c | 162 +++++++++++++++------------------------------
6 files changed, 123 insertions(+), 113 deletions(-)

--
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/