Re: [RFC PATCH 00/16] 1GB THP support on x86_64

From: Matthew Wilcox
Date: Tue Sep 08 2020 - 16:01:29 EST

On Tue, Sep 08, 2020 at 10:05:11AM -0400, Zi Yan wrote:
> On 8 Sep 2020, at 7:57, David Hildenbrand wrote:
> > I have concerns if we would silently use 1~GB THPs in most scenarios
> > where be would have used 2~MB THP. I'd appreciate a trigger to
> > explicitly enable that - MADV_HUGEPAGE is not sufficient because some
> > applications relying on that assume that the THP size will be 2~MB
> > (especially, if you want sparse, large VMAs).
> This patchset is not intended to silently use 1GB THP in place of 2MB THP.
> First of all, there is a knob /sys/kernel/mm/transparent_hugepage/enable_1GB
> to enable 1GB THP explicitly. Also, 1GB THP is allocated from a reserved CMA
> region (although I had alloc_contig_pages as a fallback, which can be removed
> in next version), so users need to add hugepage_cma=nG kernel parameter to
> enable 1GB THP allocation. If a finer control is necessary, we can add
> a new MADV_HUGEPAGE_1GB for 1GB THP.

I think we do need that flag. Machines don't run a single workload
(arguably with VMs, we're getting closer to going back to the single
workload per machine, but that's a different matter). So if there's
one app that wants 2MB pages and one that wants 1GB pages, we need to
be able to distinguish them.

I could also see there being an app which benefits from 1GB for
one mapping and prefers 2GB for a different mapping, so I think the
per-mapping madvise flag is best.

I'm a little wary of encoding the size of an x86 PUD in the Linux API
though. Probably best to follow the example set in
include/uapi/asm-generic/hugetlb_encode.h, but I don't love it. I
don't have a better suggestion though.