Re: [PATCH v7 0/5] Free the 2nd vmemmap page associated with each HugeTLB page

From: Mike Kravetz
Date: Wed Feb 09 2022 - 17:49:48 EST


On 2/8/22 23:44, Muchun Song wrote:
> On Wed, Jan 26, 2022 at 4:04 PM Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote:
>>
>> On Wed, Nov 24, 2021 at 11:09 AM Andrew Morton
>> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> On Mon, 22 Nov 2021 12:21:32 +0800 Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote:
>>>
>>>> On Wed, Nov 10, 2021 at 2:18 PM Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote:
>>>>>
>>>>> On Tue, Nov 9, 2021 at 3:33 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> On 11/8/21 12:16 AM, Muchun Song wrote:
>>>>>>> On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@xxxxxxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> This series can minimize the overhead of struct page for 2MB HugeTLB pages
>>>>>>>> significantly. It further reduces the overhead of struct page by 12.5% for
>>>>>>>> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
>>>>>>>> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
>>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Ping guys. Does anyone have any comments or suggestions
>>>>>>> on this series?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>
>>>>>> I did look over the series earlier. I have no issue with the hugetlb and
>>>>>> vmemmap modifications as they are enhancements to the existing
>>>>>> optimizations. My primary concern is the (small) increased overhead
>>>>>> for the helpers as outlined in your cover letter. Since these helpers
>>>>>> are not limited to hugetlb and used throughout the kernel, I would
>>>>>> really like to get comments from others with a better understanding of
>>>>>> the potential impact.
>>>>>
>>>>> Thanks Mike. I'd like to hear others' comments about this as well.
>>>>> From my point of view, maybe the (small) overhead is acceptable
>>>>> since it only affects the head page, however Matthew Wilcox's folio
>>>>> series could reduce this situation as well.
>>>
>>> I think Mike was inviting you to run some tests to quantify the
>>> overhead ;)
>>
>> Hi Andrew,
>>
>> Sorry for the late reply.
>>
>> Specific overhead figures are already in the cover letter. Also,
>> I did some other tests, e.g. kernel compilation, sysbench. I didn't
>> see any regressions.
>
> The overhead is introduced by page_fixed_fake_head() which
> has an "if" statement and an access to a possible cold cache line.
> I think the main overhead is from the latter. However, probabilistically,
> only 1/64 of the pages need to do the latter. And
> page_fixed_fake_head() is already simple (I mean the overhead
> is small enough) and many performance bottlenecks in mm are
> not in compound_head(). This also matches the tests I did.
> I didn't see any regressions after enabling this feature.
>
> I knew Mike's concern is the increased overhead to use cases
> beyond HugeTLB. If we really want to avoid the access to
> a possible cold cache line, we can introduce a new page
> flag like PG_hugetlb and test if it is set in the page->flags,
> if so, then return the read head page struct. Then
> page_fixed_fake_head() looks like below.
>
> static __always_inline const struct page *page_fixed_fake_head(const
> struct page *page)
> {
> if (!hugetlb_free_vmemmap_enabled())
> return page;
>
> if (test_bit(PG_hugetlb, &page->flags)) {
> unsigned long head = READ_ONCE(page[1].compound_head);
>
> if (likely(head & 1))
> return (const struct page *)(head - 1);
> }
> return page;
> }
>
> But I don't think it's worth doing this.
>
> Hi Mike and Andrew,
>
> Since these helpers are not limited to hugetlb and used throughout the
> kernel, I would really like to get comments from others with a better
> understanding of the potential impact. Do you have any appropriate
> reviewers to invite?
>

I think the appropriate people are already on Cc as they provided input on
the original vmemmap optimization series.

The question that needs to be answered is simple enough: Is the savings of
one vmemmap page per hugetlb page worth the extra minimal overhead in
compound_head()? Like most things, this depends on workload.

One thing to note is that compound_page() overhead is only introduced if
hugetlb vmemmap freeing is enabled. Correct? During the original vmemmap
optimization discussions, people thought it important that this be 'opt in'. I do not know if distos will enable this by default. But, perhaps the
potential overhead can be thought of as just part of 'opting in' for
vmemmap optimizations.
--
Mike Kravetz