[PATCH 0/6] Enable parallel page migration

From: Anshuman Khandual
Date: Fri Feb 17 2017 - 06:26:12 EST

This patch series is base on the work posted by Zi Yan back in
November 2016 (https://lkml.org/lkml/2016/11/22/457) but includes some
amount clean up and re-organization. This series depends on THP migration
optimization patch series posted by Naoya Horiguchi on 8th November 2016
(https://lwn.net/Articles/705879/). Though Zi Yan has recently reposted
V3 of the THP migration patch series (https://lwn.net/Articles/713667/),
this series is yet to be rebased.

Primary motivation behind this patch series is to achieve higher
bandwidth of memory migration when ever possible using multi threaded
instead of a single threaded copy. Did all the experiments using a two
socket X86 sytsem (Intel(R) Xeon(R) CPU E5-2650). All the experiments
here have same allocation size 4K * 100000 (which did not split evenly
for the 2MB huge pages). Here are the results.


Moved 100000 normal pages in 247.000000 msecs 1.544412 GBs
Moved 100000 normal pages in 238.000000 msecs 1.602814 GBs
Moved 195 huge pages in 252.000000 msecs 1.513769 GBs
Moved 195 huge pages in 257.000000 msecs 1.484318 GBs

THP migration improvements:

Moved 100000 normal pages in 302.000000 msecs 1.263145 GBs
Moved 100000 normal pages in 262.000000 msecs 1.455991 GBs
Moved 195 huge pages in 120.000000 msecs 3.178914 GBs
Moved 195 huge pages in 129.000000 msecs 2.957130 GBs

THP migration improvements + Multi threaded page copy:

Moved 100000 normal pages in 1589.000000 msecs 0.240069 GBs **
Moved 100000 normal pages in 1932.000000 msecs 0.197448 GBs **
Moved 195 huge pages in 54.000000 msecs 7.064254 GBs ***
Moved 195 huge pages in 86.000000 msecs 4.435694 GBs ***

** Using multi threaded copy can be detrimental to performance if
used for regular pages which are way too small. But then the
framework provides the means to use it if some kernel/driver
caller or user application wants to use it.

*** These applications have used the new MPOL_MF_MOVE_MT flag while
calling the system calls like mbind() and move_pages().

On POWER8 the improvements are similar when tested with a draft patch
which enables migration at PMD level. Not putting out the results here
as the kernel is not stable with the that draft patch and crashes some
times. We are working on enabling PMD level migration on POWER8 and will
test this series out thoroughly when its ready.

Patch Series Description::

Patch 1: Add new parameter to migrate_page_copy and copy_huge_page so
that it can differentiate between when to use single threaded
version (MIGRATE_ST) or multi threaded version (MIGRATE_MT).

Patch 2: Make migrate_mode types non-exclusive.

Patch 3: Add copy_pages_mthread function which does the actual multi
threaded copy. This involves splitting the copy work into
chunks, selecting threads and submitting copy jobs in the
work queues.

Patch 4: Add new migrate mode MIGRATE_MT to be used by higher level
migration functions.

Patch 5: Add new migration flag MPOL_MF_MOVE_MT for migration system
calls to be used in the user space.

Patch 6: Define global mt_page_copy tunable which turns on the multi
threaded page copy no matter what for all migrations on the

Outstanding Issues::

Issue 1: The usefulness of the global multi threaded copy tunable i.e
vm.mt_page_copy. It makes sense and helps in validating the
framework. Should this be moved to debugfs instead ?

Issue 2: We choose nr_copythreads = 8 as maximum number of threads on
a node can be 8 on any architecture (Which is on POWER8 if
I am not missing any other arch which might have equal or
more number of threads per node). It just denotes max number
of threads and we will be adjusted based on cpumask_weight
value on destination node. Can we do better, suggestions ?

Issue 3: Multi threaded page migration works best with threads allocated
at different physical cores, not all in the same hyper-threaded
core. Work queues submitted jobs consume scheduler slots from
the given thread to execute the copy. This can interfere with
scheduling and affect some already running tasks on the system.
Should we be looking into arch topology information, scheduler
cpu idle details to decide on which threads to use before going
for multi threaded copy ? Abort multi threaded copy and fallback
to regular copy at times when the parameters are not good ?

Any comments, suggestions are welcome.

Zi Yan (6):
mm/migrate: Add new mode parameter to migrate_page_copy() function
mm/migrate: Make migrate_mode types non-exclussive
mm/migrate: Add copy_pages_mthread function
mm/migrate: Add new migrate mode MIGRATE_MT
mm/migrate: Add new migration flag MPOL_MF_MOVE_MT for syscalls
sysctl: Add global tunable mt_page_copy

fs/aio.c | 2 +-
fs/f2fs/data.c | 2 +-
fs/hugetlbfs/inode.c | 2 +-
fs/ubifs/file.c | 2 +-
include/linux/highmem.h | 2 +
include/linux/migrate.h | 6 ++-
include/linux/migrate_mode.h | 8 ++--
include/uapi/linux/mempolicy.h | 4 +-
kernel/sysctl.c | 10 +++++
mm/Makefile | 2 +
mm/compaction.c | 20 +++++-----
mm/copy_pages_mthread.c | 87 ++++++++++++++++++++++++++++++++++++++++++
mm/mempolicy.c | 7 +++-
mm/migrate.c | 81 +++++++++++++++++++++++++++------------
14 files changed, 190 insertions(+), 45 deletions(-)
create mode 100644 mm/copy_pages_mthread.c