Re: [RFC PATCH 0/3] Add hugetlb MADV_DONTNEED support

From: Mike Kravetz
Date: Thu Jan 27 2022 - 12:56:22 EST


On 1/27/22 03:57, David Hildenbrand wrote:
> On 13.01.22 19:03, Mike Kravetz wrote:
>> Userfaultfd selftests for hugetlb does not perform UFFD_EVENT_REMAP
>> testing. However, mremap support was recently added in commit
>> 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed
>> vma"). While attempting to enable mremap support in the test, it was
>> discovered that the mremap test indirectly depends on MADV_DONTNEED.
>>
>> hugetlb does not support MADV_DONTNEED. However, the only thing
>> preventing support is a check in can_madv_lru_vma(). Simply removing
>> the check will enable support.
>>
>> This is sent as a RFC because there is no existing use case calling
>> for hugetlb MADV_DONTNEED support except possibly the userfaultfd test.
>> However, adding support makes sense as it is fairly trivial and brings
>> hugetlb functionality more in line with 'normal' memory.
>>
>
> Just a note:
>
> QEMU doesn't use huge anonymous memory directly (MAP_ANON | MAP_HUGE...)
> but instead always goes either via hugetlbfs or via memfd.
>
> For MAP_PRIVATE hugetlb mappings, fallocate(FALLOC_FL_PUNCH_HOLE) seems
> to get the job done (IOW: also discards private anon pages). See the
> comments in the QEMU code below. I remember that that is somewhat
> inconsistent. For ordinary MAP_PRIVATE mapped files I remember that we
> always need fallocate(FALLOC_FL_PUNCH_HOLE) + madvise(QEMU_MADV_DONTNEED)
> to make sure
>
> a) All file pages are removed
> b) All private anon pages are removed
>
> IIRC hugetlbfs really is different in that regard, but maybe other fs
> behave similarly.

Yes it is really different. And, some might even consider that a bug?
Imagine if those private anon (COW) pages contain important data. They
could be unmapped/freed by some other process that has write access to
the hugetlb file on which the private mapping is based.

I believe this same issue exists for hugetlbfs ftruncate. When fallocate
hole punch support was added, it was based on the ftruncate functionality.

I am hesitant to change the behavior of hugetlb hole punch or truncate
as people may be relying on that behavior today. Your QEMU example is
one such case.

Thanks,
--
Mike Kravetz