Re: [PATCH] mm: huge_memory: a new debugfs interface for splitting THP tests.

From: David Hildenbrand
Date: Mon Mar 08 2021 - 14:31:56 EST


On 08.03.21 20:11, Yang Shi wrote:
On Mon, Mar 8, 2021 at 11:01 AM Zi Yan <ziy@xxxxxxxxxx> wrote:

On 8 Mar 2021, at 13:11, David Hildenbrand wrote:

On 08.03.21 18:49, Zi Yan wrote:
On 8 Mar 2021, at 11:17, David Hildenbrand wrote:

On 08.03.21 16:22, Zi Yan wrote:
From: Zi Yan <ziy@xxxxxxxxxx>

By writing "<pid>,<vaddr_start>,<vaddr_end>" to
<debugfs>/split_huge_pages_in_range_pid, THPs in the process with the
given pid and virtual address range are split. It is used to test
split_huge_page function. In addition, a selftest program is added to
tools/testing/selftests/vm to utilize the interface by splitting
PMD THPs and PTE-mapped THPs.

Won't something like

1. MADV_HUGEPAGE

2. Access memory

3. MADV_NOHUGEPAGE

Have a similar effect? What's the benefit of this?

Thanks for checking the patch.

No, MADV_NOHUGEPAGE just replaces VM_HUGEPAGE with VM_NOHUGEPAGE,
nothing else will be done.

Ah, okay - maybe my memory was tricking me. There is some s390x KVM code that forces MADV_NOHUGEPAGE and force-splits everything.

I do wonder, though, if this functionality would be worth a proper user interface (e.g., madvise), though. There might be actual benefit in having this as a !debug interface.

I think you aware of the discussion in https://lkml.kernel.org/r/d098c392-273a-36a4-1a29-59731cdf5d3d@xxxxxxxxxx

Yes. Thanks for bringing this up.


If there will be an interface to collapse a THP -- "this memory area is worth extra performance now by collapsing a THP if possible" -- it might also be helpful to have the opposite functionality -- "this memory area is not worth a THP, rather use that somehwere else".

MADV_HUGE_COLLAPSE vs. MADV_HUGE_SPLIT

I agree that MADV_HUGE_SPLIT would be useful as the opposite of COLLAPSE when user might just want PAGESIZE mappings.
Right now, HUGE_SPLIT is implicit from mapping changes like mprotect or MADV_DONTNEED.

IMHO, it sounds not very useful. MADV_DONTNEED would split PMD for any
partial THP. If the range covers the whole THP, the whole THP is going
to be freed anyway. All other places in kernel which need split THP
have been covered. So I didn't realize any usecase from userspace for
just splitting PMD to PTEs.

THP are a limited resource. So indicating which virtual memory regions are not performance sensitive right now (e.g., cold pages in a databse) and not worth a THP might be quite valuable, no?

--
Thanks,

David / dhildenb