Re: [RFC PATCH v2 00/30] 1GB PUD THP support on x86_64

From: Zi Yan
Date: Thu Oct 01 2020 - 11:14:25 EST


On 30 Sep 2020, at 7:55, Michal Hocko wrote:

> On Mon 28-09-20 13:53:58, Zi Yan wrote:
>> From: Zi Yan <ziy@xxxxxxxxxx>
>>
>> Hi all,
>>
>> This patchset adds support for 1GB PUD THP on x86_64. It is on top of
>> v5.9-rc5-mmots-2020-09-18-21-23. It is also available at:
>> https://github.com/x-y-z/linux-1gb-thp/tree/1gb_thp_v5.9-rc5-mmots-2020-09-18-21-23
>>
>> Other than PUD THP, we had some discussion on generating THPs and contiguous
>> physical memory via a synchronous system call [0]. I am planning to send out a
>> separate patchset on it later, since I feel that it can be done independently of
>> PUD THP support.
>
> While the technical challenges for the kernel implementation can be
> discussed before the user API is decided I believe we cannot simply add
> something now and then decide about a proper interface. I have raised
> few basic questions we should should find answers for before the any
> interface is added. Let me copy them here for easier reference
Sure. Thank you for doing this.

For this new interface, I think it should generate THPs out of populated
memory regions synchronously. It would be complement to khugepaged, which
generate THPs asynchronously on the background.

> - THP allocation time - #PF and/or madvise context
I am not sure this is relevant, since the new interface is supposed to
operate on populated memory regions. For THP allocation, madvise and
the options from /sys/kernel/mm/transparent_hugepage/defrag should give
enough choices to users.

> - lazy/sync instantiation

I would say the new interface only does sync instantiation. madvise has
provided the lazy instantiation option by adding MADV_HUGEPAGE to populated
memory regions and letting khugepaged generate THPs from them.

> - huge page sizes controllable by the userspace?

It might be good to allow advanced users to choose the page sizes, so they
have better control of their applications. For normal users, we can provide
best-effort service. Different options can be provided for these two cases.
The new interface might want to inform user how many THPs are generated
after the call for them to decide what to do with the memory region.

> - aggressiveness - how hard to try

The new interface would try as hard as it can, since I assume users really
want THPs when they use this interface.

> - internal fragmentation - allow to create THPs on sparsely or unpopulated
> ranges

The new interface would only operate on populated memory regions. MAP_POPULATE
like option can be added if necessary.


> - do we need some sort of access control or privilege check as some THPs
> would be a really scarce (like those that require pre-reservation).

It seems too much to me. I suppose if we provide page size options to users
when generating THPs, users apps could coordinate themselves. BTW, do we have
access control for hugetlb pages? If yes, we could borrow their method.



Best Regards,
Yan Zi

Attachment: signature.asc
Description: OpenPGP digital signature