[PATCH mm-unstable v2 0/3] mm/hugetlb: alloc/free gigantic folios

From: Yu Zhao
Date: Tue Aug 13 2024 - 23:55:05 EST


Use __GFP_COMP for gigantic folios can greatly reduce not only the
amount of code but also the allocation and free time.

Approximate LOC to mm/hugetlb.c: +60, -240

Allocate and free 500 1GB hugeTLB memory without HVO by:
time echo 500 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

Before After
Alloc ~13s ~10s
Free ~15s <1s

The above magnitude generally holds for multiple x86 and arm64 CPU
models.

Perf profile before:
Alloc
- 99.99% alloc_pool_huge_folio
- __alloc_fresh_hugetlb_folio
- 83.23% alloc_contig_pages_noprof
- 47.46% alloc_contig_range_noprof
- 20.96% isolate_freepages_range
16.10% split_page
- 14.10% start_isolate_page_range
- 12.02% undo_isolate_page_range

Free
- update_and_free_pages_bulk
- 87.71% free_contig_range
- 76.02% free_unref_page
- 41.30% free_unref_page_commit
- 32.58% free_pcppages_bulk
- 24.75% __free_one_page
13.96% _raw_spin_trylock
12.27% __update_and_free_hugetlb_folio

Perf profile after:
Alloc
- 99.99% alloc_pool_huge_folio
alloc_gigantic_folio
- alloc_contig_pages_noprof
- 59.15% alloc_contig_range_noprof
- 20.72% start_isolate_page_range
20.64% prep_new_page
- 17.13% undo_isolate_page_range

Free
- update_and_free_pages_bulk
- __folio_put
- __free_pages_ok
7.46% free_tail_page_prepare
- 1.97% free_one_page
1.86% __free_one_page

Yu Zhao (3):
mm/contig_alloc: support __GFP_COMP
mm/cma: add cma_{alloc,free}_folio()
mm/hugetlb: use __GFP_COMP for gigantic folios

include/linux/cma.h | 16 +++
include/linux/gfp.h | 23 ++++
include/linux/hugetlb.h | 9 +-
mm/cma.c | 55 ++++++--
mm/compaction.c | 41 +-----
mm/hugetlb.c | 293 ++++++++--------------------------------
mm/page_alloc.c | 111 ++++++++++-----
7 files changed, 226 insertions(+), 322 deletions(-)

--
2.46.0.76.ge559c4bf1a-goog