Re: [RFC PATCH 00/16] 1GB THP support on x86_64

From: Kirill A. Shutemov
Date: Thu Sep 03 2020 - 10:47:18 EST


On Wed, Sep 02, 2020 at 02:06:12PM -0400, Zi Yan wrote:
> From: Zi Yan <ziy@xxxxxxxxxx>
>
> Hi all,
>
> This patchset adds support for 1GB THP on x86_64. It is on top of
> v5.9-rc2-mmots-2020-08-25-21-13.
>
> 1GB THP is more flexible for reducing translation overhead and increasing the
> performance of applications with large memory footprint without application
> changes compared to hugetlb.

This statement needs a lot of justification. I don't see 1GB THP as viable
for any workload. Opportunistic 1GB allocation is very questionable
strategy.

> Design
> =======
>
> 1GB THP implementation looks similar to exiting THP code except some new designs
> for the additional page table level.
>
> 1. Page table deposit and withdraw using a new pagechain data structure:
> instead of one PTE page table page, 1GB THP requires 513 page table pages
> (one PMD page table page and 512 PTE page table pages) to be deposited
> at the page allocaiton time, so that we can split the page later. Currently,
> the page table deposit is using ->lru, thus only one page can be deposited.

False. Current code can deposit arbitrary number of page tables.

What can be problem to you is that these page tables tied to struct page
of PMD page table.

> A new pagechain data structure is added to enable multi-page deposit.
>
> 2. Triple mapped 1GB THP : 1GB THP can be mapped by a combination of PUD, PMD,
> and PTE entries. Mixing PUD an PTE mapping can be achieved with existing
> PageDoubleMap mechanism. To add PMD mapping, PMDPageInPUD and
> sub_compound_mapcount are introduced. PMDPageInPUD is the 512-aligned base
> page in a 1GB THP and sub_compound_mapcount counts the PMD mapping by using
> page[N*512 + 3].compound_mapcount.

I had hard time reasoning about DoubleMap vs. rmap. Good for you if you
get it right.

> 3. Using CMA allocaiton for 1GB THP: instead of bump MAX_ORDER, it is more sane
> to use something less intrusive. So all 1GB THPs are allocated from reserved
> CMA areas shared with hugetlb. At page splitting time, the bitmap for the 1GB
> THP is cleared as the resulting pages can be freed via normal page free path.
> We can fall back to alloc_contig_pages for 1GB THP if necessary.
>

--
Kirill A. Shutemov