Re: [RFC PATCH 0/3] support large folio for mlock

From: Yin, Fengwei
Date: Sat Jul 08 2023 - 01:01:30 EST




On 7/8/2023 12:45 PM, Yu Zhao wrote:
> On Fri, Jul 7, 2023 at 10:52 AM Yin Fengwei <fengwei.yin@xxxxxxxxx> wrote:
>>
>> Yu mentioned at [1] about the mlock() can't be applied to large folio.
>>
>> I leant the related code and here is my understanding:
>> - For RLIMIT_MEMLOCK related, there is no problem. Becuase the
>> RLIMIT_MEMLOCK statistics is not related underneath page. That means
>> underneath page mlock or munlock doesn't impact the RLIMIT_MEMLOCK
>> statistics collection which is always correct.
>>
>> - For keeping the page in RAM, there is no problem either. At least,
>> during try_to_unmap_one(), once detect the VMA has VM_LOCKED bit
>> set in vm_flags, the folio will be kept whatever the folio is
>> mlocked or not.
>>
>> So the function of mlock for large folio works. But it's not optimized
>> because the page reclaim needs scan these large folio and may split
>> them.
>>
>> This series identified the large folio for mlock to two types:
>> - The large folio is in VM_LOCKED VMA range
>> - The large folio cross VM_LOCKED VMA boundary
>>
>> For the first type, we mlock large folio so page relcaim will skip it.
>> For the second type, we don't mlock large folio. It's allowed to be
>> picked by page reclaim and be split. So the pages not in VM_LOCKED VMA
>> range are allowed to be reclaimed/released.
>
> This is a sound design, which is also what I have in mind. I see the
> rationales are being spelled out in this thread, and hopefully
> everyone can be convinced.
>
>> patch1 introduce API to check whether large folio is in VMA range.
>> patch2 make page reclaim/mlock_vma_folio/munlock_vma_folio support
>> large folio mlock/munlock.
>> patch3 make mlock/munlock syscall support large folio.
>
> Could you tidy up the last patch a little bit? E.g., Saying "mlock the
> 4K folio" is obviously not the best idea.
>
> And if it's possible, make the loop just look like before, i.e.,
>
> if (!can_mlock_entire_folio())
> continue;
> if (vma->vm_flags & VM_LOCKED)
> mlock_folio_range();
> else
> munlock_folio_range();
This can make large folio mlocked() even user space call munlock()
to the range. Considering following case:
1. mlock() 64K range and underneath 64K large folio is mlocked().
2. mprotect the first 32K range to different prot and triggers
VMA split.
3. munlock() 64K range. As 64K large folio doesn't in these two
new VMAs range, it will not be munlocked() and only can be
reclaimed after it's unmapped from two VMAs instead of after
the range is munlocked().


Regards
Yin, Fengwei

>
> Thanks.