Re: [RFC 08/11] khugepaged: introduce khugepaged_scan_bitmap for mTHP support

From: Nico Pache
Date: Fri Jan 10 2025 - 16:49:15 EST


On Fri, Jan 10, 2025 at 7:54 AM Dev Jain <dev.jain@xxxxxxx> wrote:
>
>
>
> On 09/01/25 5:01 am, Nico Pache wrote:
> > khugepaged scans PMD ranges for potential collapse to a hugepage. To add
> > mTHP support we use this scan to instead record chunks of fully utilized
> > sections of the PMD.
> >
> > create a bitmap to represent a PMD in order MTHP_MIN_ORDER chunks.
> > by default we will set this to order 3. The reasoning is that for 4K 512
> > PMD size this results in a 64 bit bitmap which has some optimizations.
> > For other arches like ARM64 64K, we can set a larger order if needed.
> >
> > khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap
> > that represents chunks of fully utilized regions. We can then determine
> > what mTHP size fits best and in the following patch, we set this bitmap
> > while scanning the PMD.
> >
> > max_ptes_none is used as a scale to determine how "full" an order must
> > be before being considered for collapse.
> >
> > Signed-off-by: Nico Pache <npache@xxxxxxxxxx>
> > ---
> > include/linux/khugepaged.h | 4 +-
> > mm/khugepaged.c | 129 +++++++++++++++++++++++++++++++++++--
> > 2 files changed, 126 insertions(+), 7 deletions(-)
> >
>
> [--snip--]
>
> >
> > +// Recursive function to consume the bitmap
> > +static int khugepaged_scan_bitmap(struct mm_struct *mm, unsigned long address,
> > + int referenced, int unmapped, struct collapse_control *cc,
> > + bool *mmap_locked, unsigned long enabled_orders)
> > +{
> > + u8 order, offset;
> > + int num_chunks;
> > + int bits_set, max_percent, threshold_bits;
> > + int next_order, mid_offset;
> > + int top = -1;
> > + int collapsed = 0;
> > + int ret;
> > + struct scan_bit_state state;
> > +
> > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
> > + { HPAGE_PMD_ORDER - MIN_MTHP_ORDER, 0 };
> > +
> > + while (top >= 0) {
> > + state = cc->mthp_bitmap_stack[top--];
> > + order = state.order;
> > + offset = state.offset;
> > + num_chunks = 1 << order;
> > + // Skip mTHP orders that are not enabled
> > + if (!(enabled_orders >> (order + MIN_MTHP_ORDER)) & 1)
> > + goto next;
> > +
> > + // copy the relavant section to a new bitmap
> > + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset,
> > + MTHP_BITMAP_SIZE);
> > +
> > + bits_set = bitmap_weight(cc->mthp_bitmap_temp, num_chunks);
> > +
> > + // Check if the region is "almost full" based on the threshold
> > + max_percent = ((HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) * 100)
> > + / (HPAGE_PMD_NR - 1);
> > + threshold_bits = (max_percent * num_chunks) / 100;
> > +
> > + if (bits_set >= threshold_bits) {
> > + ret = collapse_huge_page(mm, address, referenced, unmapped, cc,
> > + mmap_locked, order + MIN_MTHP_ORDER, offset * MIN_MTHP_NR);
> > + if (ret == SCAN_SUCCEED)
> > + collapsed += (1 << (order + MIN_MTHP_ORDER));
> > + continue;
> > + }
>
> We are going to the lower order when it is not in the allowed mask of
> orders, or when we are below the threshold. What to do when these
> conditions do not happen, and the reason for collapse failure is
> collapse_huge_page()? For example, if you start with a PMD order scan,
> and collapse_huge_page() fails, then you hit "continue", and then exit
> the loop because there is nothing else in the stack, so we exit without
> trying mTHPs.

Thanks for catching that, I introduced that bug when I went from the
recursion to stack based approach.
This should only continue on SCAN_SUCCEED. If not it needs to go next:

I think I also need to handle the case where nothing succeeds in
khugepaged_scan_pmd.


>
> > +
> > +next:
> > + if (order > 0) {
> > + next_order = order - 1;
> > + mid_offset = offset + (num_chunks / 2);
> > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
> > + { next_order, mid_offset };
> > + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
> > + { next_order, offset };
> > + }
> > + }
> > + return collapsed;
> > +}
> > +
>