Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL
From: Michal Hocko
Date: Mon Oct 23 2017 - 08:41:28 EST
On Mon 23-10-17 14:22:30, C.Wehrmeyer wrote:
> On 2017-10-23 13:42, Michal Hocko wrote:
> > I do not remember any such a request either. I can see some merit in the
> > described use case. It is not specific on why hugetlb pages are used for
> > the allocator memory because that comes with it own issues.
>
> That is yet for the user to specify. As of now hugepages still require a
> special setup that not all people might have as of now - to my knowledge a
> kernel being compiled with CONFIG_TRANSPARENT_HUGEPAGE=y and a number of
> such pages being allocated either through the kernel boot line or through
CONFIG_TRANSPARENT_HUGEPAGE has nothing to do with hugetlb pages. These
are THP which do not need any special configuration and mremap works on
them.
> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages. I'm deliberately
> ignoring 1-GiB pages here because those are only allocatable during boot,
> when no processes have been spawned and memory is still not fragmented.
This is no longer true. GB pages can be allocated during runtime as
well.
> My point is that I can see people not being too eager to support 1 GiB pages
> as of now unless for very specific use case.
1G or 2M pages make absolutely no difference from the mremap semantic.
It is just pte to be updated. The problem at hands is that hugetlb
implementation is far from straightforward and the lack of mremap is
mainly caused by implementation details (like reservetions I presume).
> 2-MiB pages, on the other hand,
> shouldn't have those limitations anymore. User-space programs should be
> capable of allocating such pages without the need for the user to fiddle
> with nr_hugepages beforehand.
And that is what we have THP for...
[...]
> With the knowledge that allocations in the Mebibyte range aren't uncommon at
> all nowadays and that one 2-MiB page eliminates the need for 512 4-KiB
> pages, we really should make advances towards treating 2-MiB pages just as
> casual as older pages. Allocators can still query if the kernel supports the
> specified page size, and specifying MAP_HUGETLB | MAP_HUGE_2MB would still
> be required in order to not break older programs, but from my perspective
> there is a lot to gain here.
I can see your sentiment here but hugetlb has never been really a full
featured type of memory. General purpose allocator playing with hugetlb
pages is rather tricky and I would be really cautious there. I would
rather play with THP to reduce the TLB footprint.
So by all means, mremap _should_ work with hugetlb pages but the
additional implementation and potentially the complexity should have a
strong usecase. If we can do mremap with old_size == new_size trivially
implemented then I am not really against but full featured mremap is not
worth it IMHO.
--
Michal Hocko
SUSE Labs