Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory

From: Ryan Roberts
Date: Tue Oct 31 2023 - 07:56:44 EST


On 31/10/2023 11:50, Ryan Roberts wrote:
> On 06/10/2023 21:06, David Hildenbrand wrote:
> [...]
>>
>> Change 2: sysfs interface.
>>
>> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hugepage/", I
>> agree.
>>
>> What we expose there and how, is TBD. Again, not a friend of "orders" and
>> bitmaps at all. We can do better if we want to go down that path.
>>
>> Maybe we should take a look at hugetlb, and how they added support for multiple
>> sizes. What *might* make sense could be (depending on which values we actually
>> support!)
>>
>>
>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/
>> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/
>> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/
>> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/
>> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/
>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/
>>
>> Each one would contain an "enabled" and "defrag" file. We want something minimal
>> first? Start with the "enabled" option.
>>
>>
>> enabled: always [global] madvise never
>>
>> Initially, we would set it for PMD-sized THP to "global" and for everything else
>> to "never".
>
> Hi David,
>
> I've just started coding this, and it occurs to me that I might need a small
> clarification here; the existing global "enabled" control is used to drive
> decisions for both anonymous memory and (non-shmem) file-backed memory. But the
> proposed new per-size "enabled" is implicitly only controlling anon memory (for
> now).
>
> 1) Is this potentially confusing for the user? Should we rename the per-size
> controls to "anon_enabled"? Or is it preferable to jsut keep it vague for now so
> we can reuse the same control for file-backed memory in future?
>
> 2) The global control will continue to drive the file-backed memory decision
> (for now), even when hugepages-2048kB/enabled != "global"; agreed?
>
> Thanks,
> Ryan
>

Also, an implementation question:

hugepage_vma_check() doesn't currently care whether enabled="never" for DAX VMAs
(although it does honour MADV_NOHUGEPAGE and the prctl); It will return true
regardless. Is that by design? It couldn't fathom any reasoning from the commit log:

bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
bool smaps, bool in_pf, bool enforce_sysfs)
{
if (!vma->vm_mm) /* vdso */
return false;

/*
* Explicitly disabled through madvise or prctl, or some
* architectures may disable THP for some mappings, for
* example, s390 kvm.
* */
if ((vm_flags & VM_NOHUGEPAGE) ||
test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
return false;
/*
* If the hardware/firmware marked hugepage support disabled.
*/
if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
return false;

/* khugepaged doesn't collapse DAX vma, but page fault is fine. */
if (vma_is_dax(vma))
return in_pf; <<<<<<<<

...
}