Re: [PATCH 3/3] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy

From: Donet Tom
Date: Mon Feb 26 2024 - 08:10:21 EST



On 2/20/24 14:18, Michal Hocko wrote:
On Tue 20-02-24 09:27:25, Aneesh Kumar K.V wrote:
[...]
case MPOL_PREFERRED_MANY:
if (pol->flags & MPOL_F_MORON) {
if (!mpol_preferred_should_numa_migrate(thisnid, curnid, pol))
goto out;
break;
}

/*
* use current page if in policy nodemask,
* else select nearest allowed node, if any.
* If no allowed nodes, use current [!misplaced].
*/
if (node_isset(curnid, pol->nodes))
goto out;
z = first_zones_zonelist(
node_zonelist(thisnid, GFP_HIGHUSER),
gfp_zone(GFP_HIGHUSER),
&pol->nodes);
polnid = zone_to_nid(z->zone);
break;
....
..
}

/* Migrate the folio towards the node whose CPU is referencing it */
if (pol->flags & MPOL_F_MORON) {
polnid = thisnid;

if (!should_numa_migrate_memory(current, folio, curnid,
thiscpu))
goto out;
}

if (curnid != polnid)
ret = polnid;
out:
mpol_cond_put(pol);

return ret;
}
Ohh, right this code is confusing as hell. Thanks for the clarification.
With this in mind. There should be a comment warning about MPOL_F_MOF
always being unset as the userspace cannot really set it up.

Thanks!

Hi Michal

Sorry For the late reply.
If we set  MPOL_F_NUMA_BALANCING from userspace then MPOL_F_MOF and MPOL_F_MORON flags will get set in kernel.

/* Basic parameter sanity check used by both mbind() and set_mempolicy() */
static inline int sanitize_mpol_flags(int *mode, unsigned short *flags)
{
    *flags = *mode & MPOL_MODE_FLAGS;
*mode &= ~MPOL_MODE_FLAGS;

    if ((unsigned int)(*mode) >=  MPOL_MAX)
return -EINVAL;

    if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES))
return -EINVAL;

    if (*flags & MPOL_F_NUMA_BALANCING) {
if (*mode == MPOL_BIND || *mode == MPOL_PREFERRED_MANY)
*flags |= (MPOL_F_MOF | MPOL_F_MORON);
else
return -EINVAL;
}

In current kernel it is supported only for MPOL_BIND and we added suppor for MPOL_PREFERRED_MANY also.

Why MPOL_F_MOF  flag is required?
---------------------------------
For NUMA migration the process memory is unmapped by "task_numa_work" periodically, if unmapped memory got
accessed again then NUMA hinting page fault will occur and in page fault handler the pages get migrated.

If MPOL_F_MOF is not set then "task_numa_work" will not unmap the process pages and NUMA hinting page fault
and migration will not occur. This change has been introduced by commit
fc3147245d193b (mm: numa: Limit NUMA scanning to migrate-on-fault VMAs).

How new implementation works
----------------------------
MPOL_PREFERRED_MANY is able to set MPOL_F_MOF and MPOL_F_MORON through MPOL_F_NUMA_BALANCING. So NUMA hinting
page faults will occur. In mpol_misplaced if we can do numa migration, we select the currently executing node as the target node
otherwise we end up returning from the function with ret = NUMA_NO_NODE.

So since we are able to set MPOL_F_MOF from userspace through MPOL_F_NUMA_BALANCING, no need to add this comment right?

Thanks
Donet Tom