Re: [PATCH v2] mm: hugetlb: support for shared memory policy

From: Aneesh Kumar K V
Date: Wed Oct 19 2022 - 08:15:11 EST


On 10/19/22 2:59 PM, Albert Huang wrote:
> From: "huangjie.albert" <huangjie.albert@xxxxxxxxxxxxx>
>
> implement get/set_policy for hugetlb_vm_ops to support the shared policy
> This ensures that the mempolicy of all processes sharing this huge page
> file is consistent.
>
> In some scenarios where huge pages are shared:
> if we need to limit the memory usage of vm within node0, so I set qemu's
> mempilciy bind to node0, but if there is a process (such as virtiofsd)
> shared memory with the vm, in this case. If the page fault is triggered
> by virtiofsd, the allocated memory may go to node1 which depends on
> virtiofsd. Although we can use the memory prealloc provided by qemu to
> avoid this issue, but this method will significantly increase the
> creation time of the vm(a few seconds, depending on memory size).
>
> after we hooked up hugetlb_vm_ops(set/get_policy):
> both the shared memory segments created by shmget() with SHM_HUGETLB flag
> and the mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy.
>
> v1->v2:
> 1、hugetlb share the memory policy when the vma with the VM_SHARED flag.
> 2、update the documentation.
>
> Signed-off-by: huangjie.albert <huangjie.albert@xxxxxxxxxxxxx>
> ---
> .../admin-guide/mm/numa_memory_policy.rst | 20 +++++++++------
> mm/hugetlb.c | 25 +++++++++++++++++++
> 2 files changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
> index 5a6afecbb0d0..5672a6c2d2ef 100644
> --- a/Documentation/admin-guide/mm/numa_memory_policy.rst
> +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
> @@ -133,14 +133,18 @@ Shared Policy
> the object share the policy, and all pages allocated for the
> shared object, by any task, will obey the shared policy.
>
> - As of 2.6.22, only shared memory segments, created by shmget() or
> - mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy. When shared
> - policy support was added to Linux, the associated data structures were
> - added to hugetlbfs shmem segments. At the time, hugetlbfs did not
> - support allocation at fault time--a.k.a lazy allocation--so hugetlbfs
> - shmem segments were never "hooked up" to the shared policy support.
> - Although hugetlbfs segments now support lazy allocation, their support
> - for shared policy has not been completed.
> + As of 2.6.22, only shared memory segments, created by shmget() without
> + SHM_HUGETLB flag or mmap(MAP_ANONYMOUS|MAP_SHARED) without MAP_HUGETLB
> + flag, support shared policy. When shared policy support was added to Linux,
> + the associated data structures were added to hugetlbfs shmem segments.
> + At the time, hugetlbfs did not support allocation at fault time--a.k.a
> + lazy allocation--so hugetlbfs shmem segments were never "hooked up" to
> + the shared policy support. Although hugetlbfs segments now support lazy
> + allocation, their support for shared policy has not been completed.
> +
> + after we hooked up hugetlb_vm_ops(set/get_policy):
> + both the shared memory segments created by shmget() with SHM_HUGETLB flag
> + and mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy.
>
> As mentioned above in :ref:`VMA policies <vma_policy>` section,
> allocations of page cache pages for regular files mmap()ed
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 87d875e5e0a9..fc7038931832 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4632,6 +4632,27 @@ static vm_fault_t hugetlb_vm_op_fault(struct vm_fault *vmf)
> return 0;
> }
>
> +#ifdef CONFIG_NUMA
> +int hugetlb_vm_op_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
> +{
> + struct inode *inode = file_inode(vma->vm_file);
> +
> + if (!(vma->vm_flags & VM_SHARED))
> + return 0;
> +
> + return mpol_set_shared_policy(&HUGETLBFS_I(inode)->policy, vma, mpol);
> +}
> +
> +struct mempolicy *hugetlb_vm_op_get_policy(struct vm_area_struct *vma, unsigned long addr)
> +{
> + struct inode *inode = file_inode(vma->vm_file);
> + pgoff_t index;
> +
> + index = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> + return mpol_shared_policy_lookup(&HUGETLBFS_I(inode)->policy, index);
> +}
> +#endif
> +
> /*
> * When a new function is introduced to vm_operations_struct and added
> * to hugetlb_vm_ops, please consider adding the function to shm_vm_ops.
> @@ -4645,6 +4666,10 @@ const struct vm_operations_struct hugetlb_vm_ops = {
> .close = hugetlb_vm_op_close,
> .may_split = hugetlb_vm_op_split,
> .pagesize = hugetlb_vm_op_pagesize,
> +#ifdef CONFIG_NUMA
> + .set_policy = hugetlb_vm_op_set_policy,
> + .get_policy = hugetlb_vm_op_get_policy,
> +#endif
> };
>
> static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,


How is the current usage of

/* Set numa allocation policy based on index */
hugetlb_set_vma_policy(&pseudo_vma, inode, index);

enforcing the policy with the current code? Also if we have get_policy()

Can we remove the usage of the same in hugetlbfs_fallocate()
after this patch? With shared policy we should be able to fetch
the policy via get_vma_policy()?

A related question does shm_pseudo_vma_init() requires that mpolicy_lookup?

-aneesh