Re: [RFC PATCH 11/26] hugetlb: add hugetlb_walk_to to do PT walks

From: Mike Kravetz
Date: Thu Jul 07 2022 - 19:04:01 EST


On 06/27/22 18:37, manish.mishra wrote:
>
> On 24/06/22 11:06 pm, James Houghton wrote:
> > This adds it for architectures that use GENERAL_HUGETLB, including x86.

I expect this will be used in arch independent code and there will need to
be at least a stub for all architectures?

> >
> > Signed-off-by: James Houghton <jthoughton@xxxxxxxxxx>
> > ---
> > include/linux/hugetlb.h | 2 ++
> > mm/hugetlb.c | 45 +++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 47 insertions(+)
> >
> > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> > index e7a6b944d0cc..605aa19d8572 100644
> > --- a/include/linux/hugetlb.h
> > +++ b/include/linux/hugetlb.h
> > @@ -258,6 +258,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
> > unsigned long addr, unsigned long sz);
> > pte_t *huge_pte_offset(struct mm_struct *mm,
> > unsigned long addr, unsigned long sz);
> > +int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte,
> > + unsigned long addr, unsigned long sz, bool stop_at_none);
> > int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
> > unsigned long *addr, pte_t *ptep);
> > void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 557b0afdb503..3ec2a921ee6f 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -6981,6 +6981,51 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
> > return (pte_t *)pmd;
> > }
>
>
> not strong feeling but this name looks confusing to me as it does
>
> not only walk over page-tables but can also alloc.
>

Somewhat agree. With this we have:
- huge_pte_offset to walk/lookup a pte
- huge_pte_alloc to allocate ptes
- hugetlb_walk_to which does some/all of both

Do not see anything obviously wrong with the routine, but future
direction would be to combine/clean up these routines with similar
purpose.
--
Mike Kravetz

> > +int hugetlb_walk_to(struct mm_struct *mm, struct hugetlb_pte *hpte,
> > + unsigned long addr, unsigned long sz, bool stop_at_none)
> > +{
> > + pte_t *ptep;
> > +
> > + if (!hpte->ptep) {
> > + pgd_t *pgd = pgd_offset(mm, addr);
> > +
> > + if (!pgd)
> > + return -ENOMEM;
> > + ptep = (pte_t *)p4d_alloc(mm, pgd, addr);
> > + if (!ptep)
> > + return -ENOMEM;
> > + hugetlb_pte_populate(hpte, ptep, P4D_SHIFT);
> > + }
> > +
> > + while (hugetlb_pte_size(hpte) > sz &&
> > + !hugetlb_pte_present_leaf(hpte) &&
> > + !(stop_at_none && hugetlb_pte_none(hpte))) {
>
> Should this ordering of if-else condition be in reverse, i mean it will look
>
> more natural and possibly less condition checks as we go from top to bottom.
>
> > + if (hpte->shift == PMD_SHIFT) {
> > + ptep = pte_alloc_map(mm, (pmd_t *)hpte->ptep, addr);
> > + if (!ptep)
> > + return -ENOMEM;
> > + hpte->shift = PAGE_SHIFT;
> > + hpte->ptep = ptep;
> > + } else if (hpte->shift == PUD_SHIFT) {
> > + ptep = (pte_t *)pmd_alloc(mm, (pud_t *)hpte->ptep,
> > + addr);
> > + if (!ptep)
> > + return -ENOMEM;
> > + hpte->shift = PMD_SHIFT;
> > + hpte->ptep = ptep;
> > + } else if (hpte->shift == P4D_SHIFT) {
> > + ptep = (pte_t *)pud_alloc(mm, (p4d_t *)hpte->ptep,
> > + addr);
> > + if (!ptep)
> > + return -ENOMEM;
> > + hpte->shift = PUD_SHIFT;
> > + hpte->ptep = ptep;
> > + } else
> > + BUG();
> > + }
> > + return 0;
> > +}
> > +
> > #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
> > #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING