Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL

From: C.Wehrmeyer
Date: Mon Oct 23 2017 - 08:23:21 EST


On 2017-10-23 13:42, Michal Hocko wrote:
I do not remember any such a request either. I can see some merit in the
described use case. It is not specific on why hugetlb pages are used for
the allocator memory because that comes with it own issues.

That is yet for the user to specify. As of now hugepages still require a special setup that not all people might have as of now - to my knowledge a kernel being compiled with CONFIG_TRANSPARENT_HUGEPAGE=y and a number of such pages being allocated either through the kernel boot line or through /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages. I'm deliberately ignoring 1-GiB pages here because those are only allocatable during boot, when no processes have been spawned and memory is still not fragmented.

My point is that I can see people not being too eager to support 1 GiB pages as of now unless for very specific use case. 2-MiB pages, on the other hand, shouldn't have those limitations anymore. User-space programs should be capable of allocating such pages without the need for the user to fiddle with nr_hugepages beforehand.

Some time ago I've written some code to detect TLB capabilities on my current testing CPU, those are the results:

[TLB] Instruction TLB: 2M/4M pages, fully associative, 8 entries
[TLB] Data TLB: 4 KByte pages, 4-way set associative, 64 entries
[TLB] Data TLB: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries and a separate array with 1 GByte pages, 4-way set associative, 4 entries
[TLB] Instruction TLB: 4KByte pages, 8-way set associative, 64 entries
[STLB] Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries

With the knowledge that allocations in the Mebibyte range aren't uncommon at all nowadays and that one 2-MiB page eliminates the need for 512 4-KiB pages, we really should make advances towards treating 2-MiB pages just as casual as older pages. Allocators can still query if the kernel supports the specified page size, and specifying MAP_HUGETLB | MAP_HUGE_2MB would still be required in order to not break older programs, but from my perspective there is a lot to gain here.