Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory

From: Ryan Roberts
Date: Wed Oct 25 2023 - 12:24:32 EST


On 20/10/2023 13:33, Ryan Roberts wrote:
> On 06/10/2023 21:06, David Hildenbrand wrote:
>> On 29.09.23 13:44, Ryan Roberts wrote:
>>> Hi All,
>>
>
> [...]
>
>>> NOTE: These changes should not be merged until the prerequisites are complete.
>>> These are in progress and tracked at [7].
>>
>> We should probably list them here, and classify which one we see as strict a
>> requirement, which ones might be an optimization.
>>
>
> Bringing back the discussion of prerequistes to this thread following the
> discussion at the mm-alignment meeting on Wednesday.
>
> Slides, updated following discussion to reflect all the agreed items that are
> prerequisites and enhancements, are at [1].
>
> I've taken a closer look at the situation with khugepaged, and can confirm that
> it does correctly collapse anon small-sized THP into PMD-sized THP. I did notice
> though, that one of the khugepaged selftests (collapse_max_ptes_none) fails when
> small-sized THP is enabled+always. So I've fixed that test up and will add the
> patch to the next version of my series.
>
> So I believe the khugepaged prerequisite can be marked as done.
>
> [1]
> https://drive.google.com/file/d/1GnfYFpr7_c1kA41liRUW5YtCb8Cj18Ud/view?usp=sharing&resourcekey=0-U1Mj3-RhLD1JV6EThpyPyA

Hi All,

It's been a week since the mm alignment meeting discussion we had around
prerequisites and the ABI. I haven't heard any further feedback on the ABI
proposal, so I'm going to be optimistic and assume that nobody has found any
fatal flaws in it :).

Certainly, I think it held up to the potential future policies that Yu Zhou
cited on the call - the possibility of preferring a smaller size over a bigger
one, if the smaller size can be allocated without splitting a contiguous block.
I think the suggestion of adding a per-size priority file would solve it. And in
general because we have a per-size directory, that gives us lots of flexibility
for growth.

Anyway, given the lack of feedback, I'm proposing to spin a new version. I'm
planning to do the following:

- Drop the accounting patch (#3); we will continue to only account PMD-sized
THP for now. We can add more counters in future if needed. page cache large
folios haven't needed any new counters yet.

- Pivot to the ABI proposed by DavidH; per-size directories in a similar shape
to that used by hugetlb

- Drop the "recommend" keyword patch (#6); For now, users will need to
understand implicitly which sizes are beneficial to their HW perf

- Drop patch (#7); arch_wants_pte_order() is no longer needed due to dropping
patch #6

- Add patch for khugepaged selftest improvement (described in previous email
above).

- Ensure that PMD_ORDER is not assumed to be compile-time constant (current
code is broken on powerpc)

Please shout if you think this is the wrong approach.

On the prerequisites front, we have 2 items still to land:

- compaction; Zi Yan is working on a v2

- numa balancing; A developer has signed up and is working on it (I'll leave
them to reveal themself as preferred).

Thanks,
Ryan