Re: Migrate pages from a ccNUMA node to another

From: Robin Holt
Date: Fri Mar 26 2004 - 05:45:28 EST


On Fri, Mar 26, 2004 at 10:02:00AM +0100, Zoltan Menyhart wrote:
>
> Migrate pages from a ccNUMA node to another.
> ============================================
>
> 1. Hardware assisted migration
> ..............................
>

We have found that "automatic" migration ends to result in the
system deciding to move the wrong pieces around. Since applications
can be so varied, I would recommend we let the application decide
when it thinks it is beneficial to move a memory range to a nearby
node.

>
> 2. Application driven migration
> ...............................
>
> An application can exploit the forthcoming NUMA APIs to specify its initial
> memory placement policy.
> Yet what if the application wants to change its behavior ?

The placement policy doesn't really fit the bill entirely. We are
currently tracking a problem with repeatability of a benchmark. We
found that the newer libc we are using used to result in a newly
forked process touching a page before the parent did and therefore
the page, which had been marked COW, would, on the old libc end up
on the childs node for the child and parents node for the parent.
After the update, both pages ended up on the parents.

If you syscall would simply do the copy to the destination node
for COW pages, this would have worked terrifically in both cases.

>
> 3. NUMA aware scheduler
> .......................
>

Back to my earlier comment about magic. This is a second tier of
magic. Here we are talking about infering a reason to migrate based
on memory access patterns, but what if that migration results in
some other process being hurt more than this one is helped.

Honestly, we have beaten on the scheduler quite a bit and the "allocate
memory close to my node" has helped considerably.

One thing that would probably help considerably, in addition to the
syscall you seem to be proposing, would be an addition to the
task_struct. The new field would specify which node to attempt
allocations on. Before doing a fork, the parent would do a
syscall to set this field to the node the child will target. It
would then call fork. The PGDs et al and associated memory, including
the task struct and pages would end up being allocated based upon
that numa node's allocation preference.


What do you think of combining these two items into a single syscall?

>
> User mode interface
> -------------------
>
> This prototype of the page migration service is implemented as a system call,
> the different forms of which are wrapped by use of some small,
> static, inline functions.
>
> NAME
> migrate_ph_pages - migrate pages to another NUMA node

At first, I thought "Wow, this could result in some nice admin tools."
The more I scratch my head on this, the less useful I see it, but
would not argue against it.

> migrate_virt_addr_range - migrate virtual address range to another node

This one sounds good.

Thanks,
Robin Holt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/