Re: [RFC PATCH 00/16] 1GB THP support on x86_64

From: Michal Hocko
Date: Tue Sep 08 2020 - 15:17:25 EST


On Tue 08-09-20 10:05:11, Zi Yan wrote:
> On 8 Sep 2020, at 7:57, David Hildenbrand wrote:
>
> > On 03.09.20 18:30, Roman Gushchin wrote:
> >> On Thu, Sep 03, 2020 at 05:23:00PM +0300, Kirill A. Shutemov wrote:
> >>> On Wed, Sep 02, 2020 at 02:06:12PM -0400, Zi Yan wrote:
> >>>> From: Zi Yan <ziy@xxxxxxxxxx>
> >>>>
> >>>> Hi all,
> >>>>
> >>>> This patchset adds support for 1GB THP on x86_64. It is on top of
> >>>> v5.9-rc2-mmots-2020-08-25-21-13.
> >>>>
> >>>> 1GB THP is more flexible for reducing translation overhead and increasing the
> >>>> performance of applications with large memory footprint without application
> >>>> changes compared to hugetlb.
> >>>
> >>> This statement needs a lot of justification. I don't see 1GB THP as viable
> >>> for any workload. Opportunistic 1GB allocation is very questionable
> >>> strategy.
> >>
> >> Hello, Kirill!
> >>
> >> I share your skepticism about opportunistic 1 GB allocations, however it might be useful
> >> if backed by an madvise() annotations from userspace application. In this case,
> >> 1 GB THPs might be an alternative to 1 GB hugetlbfs pages, but with a more convenient
> >> interface.
> >
> > I have concerns if we would silently use 1~GB THPs in most scenarios
> > where be would have used 2~MB THP. I'd appreciate a trigger to
> > explicitly enable that - MADV_HUGEPAGE is not sufficient because some
> > applications relying on that assume that the THP size will be 2~MB
> > (especially, if you want sparse, large VMAs).
>
> This patchset is not intended to silently use 1GB THP in place of 2MB THP.
> First of all, there is a knob /sys/kernel/mm/transparent_hugepage/enable_1GB
> to enable 1GB THP explicitly. Also, 1GB THP is allocated from a reserved CMA
> region (although I had alloc_contig_pages as a fallback, which can be removed
> in next version), so users need to add hugepage_cma=nG kernel parameter to
> enable 1GB THP allocation. If a finer control is necessary, we can add
> a new MADV_HUGEPAGE_1GB for 1GB THP.

A global knob is insufficient. 1G pages will become a very precious
resource as it requires a pre-allocation (reservation). So it really has
to be an opt-in and the question is whether there is also some sort of
access control needed.

--
Michal Hocko
SUSE Labs