Re: [PATCH 3/5] migrate: Add copy_page_mt to use multi-threaded page migration.

From: Anshuman Khandual
Date: Thu Nov 24 2016 - 04:27:29 EST


On 11/22/2016 09:55 PM, Zi Yan wrote:
> From: Zi Yan <zi.yan@xxxxxxxxxxxxxx>
>
> From: Zi Yan <ziy@xxxxxxxxxx>

Please fix these.

>
> Internally, copy_page_mt splits a page into multiple threads
> and send them as jobs to system_highpri_wq.

The function should be renamed as copy_page_multithread() or at
the least copy_page_mthread() to make more sense. The commit
message needs to more comprehensive and detailed.

>
> Signed-off-by: Zi Yan <ziy@xxxxxxxxxx>
> Signed-off-by: Zi Yan <zi.yan@xxxxxxxxxxxxxx>
> ---
> include/linux/highmem.h | 2 ++
> kernel/sysctl.c | 1 +
> mm/Makefile | 2 ++
> mm/copy_page.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 101 insertions(+)
> create mode 100644 mm/copy_page.c
>
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index bb3f329..519e575 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -236,6 +236,8 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
>
> #endif
>
> +int copy_page_mt(struct page *to, struct page *from, int nr_pages);
> +
> static inline void copy_highpage(struct page *to, struct page *from)
> {
> char *vfrom, *vto;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 706309f..d54ce12 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -97,6 +97,7 @@
>
> #if defined(CONFIG_SYSCTL)
>
> +

I guess this is a stray code change.

> /* External variables not in a header file. */
> extern int suid_dumpable;
> #ifdef CONFIG_COREDUMP
> diff --git a/mm/Makefile b/mm/Makefile
> index 295bd7a..467305b 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -41,6 +41,8 @@ obj-y := filemap.o mempool.o oom_kill.o \
>
> obj-y += init-mm.o
>
> +obj-y += copy_page.o

Its getting compiled all the time. Dont you want to make it part of
of a new config option which will cover for all these code for multi
thread copy ?

> +
> ifdef CONFIG_NO_BOOTMEM
> obj-y += nobootmem.o
> else
> diff --git a/mm/copy_page.c b/mm/copy_page.c
> new file mode 100644
> index 0000000..ca7ce6c
> --- /dev/null
> +++ b/mm/copy_page.c
> @@ -0,0 +1,96 @@
> +/*
> + * Parallel page copy routine.
> + *
> + * Zi Yan <ziy@xxxxxxxxxx>
> + *
> + */

No, this is too less. Please see other files inside mm directory as
example.

> +
> +#include <linux/highmem.h>
> +#include <linux/workqueue.h>
> +#include <linux/slab.h>
> +#include <linux/freezer.h>
> +
> +
> +const unsigned int limit_mt_num = 4;

>From where this number 4 came from ? At the very least it should be
configured from either a sysctl variable or from a sysfs file, so
that user will have control on number of threads used for copy. But
going forward this should be derived out a arch specific call back
which then analyzes NUMA topology and scheduler loads to figure out
on how many threads should be used for optimum performance of page
copy.

> +
> +/* ======================== multi-threaded copy page ======================== */
> +

Please use standard exported function description semantics while
describing this new function. I think its a good function to be
exported as a symbol as well.

> +struct copy_page_info {

s/copy_page_info/mthread_copy_struct/

> + struct work_struct copy_page_work;
> + char *to;
> + char *from;

Swap the order of 'to' and 'from'.

> + unsigned long chunk_size;

Just 'size' should be fine.

> +};
> +
> +static void copy_page_routine(char *vto, char *vfrom,

s/copy_page_routine/mthread_copy_fn/

> + unsigned long chunk_size)
> +{
> + memcpy(vto, vfrom, chunk_size);
> +}

s/chunk_size/size/


> +
> +static void copy_page_work_queue_thread(struct work_struct *work)
> +{
> + struct copy_page_info *my_work = (struct copy_page_info *)work;
> +
> + copy_page_routine(my_work->to,
> + my_work->from,
> + my_work->chunk_size);
> +}
> +
> +int copy_page_mt(struct page *to, struct page *from, int nr_pages)
> +{
> + unsigned int total_mt_num = limit_mt_num;
> + int to_node = page_to_nid(to);

Should we make sure that the entire page range [to, to + nr_pages] is
part of to_node.

> + int i;
> + struct copy_page_info *work_items;
> + char *vto, *vfrom;
> + unsigned long chunk_size;
> + const struct cpumask *per_node_cpumask = cpumask_of_node(to_node);

So all the threads used for copy has to be part of cpumask of the
destination node ? Why ? The copy accesses both the source pages as
well as destination pages. Source node threads might also perform
good for the memory accesses. Which and how many threads should be
used for copy should be decided wisely from an architecture call
back. On a NUMA system this will have impact on performance of the
multi threaded copy.


> + int cpu_id_list[32] = {0};
> + int cpu;
> +
> + total_mt_num = min_t(unsigned int, total_mt_num,
> + cpumask_weight(per_node_cpumask));
> + total_mt_num = (total_mt_num / 2) * 2;
> +
> + work_items = kcalloc(total_mt_num, sizeof(struct copy_page_info),
> + GFP_KERNEL);
> + if (!work_items)
> + return -ENOMEM;
> +
> + i = 0;
> + for_each_cpu(cpu, per_node_cpumask) {
> + if (i >= total_mt_num)
> + break;
> + cpu_id_list[i] = cpu;
> + ++i;
> + }
> +
> + vfrom = kmap(from);
> + vto = kmap(to);
> + chunk_size = PAGE_SIZE*nr_pages / total_mt_num;

Coding style ? Please run all these patches though scripts/
checkpatch.pl script to catch coding style problems.

> +
> + for (i = 0; i < total_mt_num; ++i) {
> + INIT_WORK((struct work_struct *)&work_items[i],
> + copy_page_work_queue_thread);
> +
> + work_items[i].to = vto + i * chunk_size;
> + work_items[i].from = vfrom + i * chunk_size;
> + work_items[i].chunk_size = chunk_size;
> +
> + queue_work_on(cpu_id_list[i],
> + system_highpri_wq,
> + (struct work_struct *)&work_items[i]);

I am not very familiar with the system work queues but is
system_highpri_wq has the highest priority ? Because if
the time spend waiting on these work queue functions to
execute increases it can offset out all the benefits we
get by this multi threaded copy.