Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL
From: Michal Hocko
Date: Mon Oct 23 2017 - 07:42:22 EST
On Fri 20-10-17 15:42:25, Mike Kravetz wrote:
> On 10/19/2017 12:34 AM, C.Wehrmeyer wrote:
[...]
> > As for the specific use case: I've written my own allocator that is
> > not bound on the same limitations that usual malloc/realloc/free
> > allocators are bound. As such I want to be able to eliminate as many
> > page walks as possible.
> >
> > Just excepting the limitation would put Linux down on the same level
> > as the Windows API, where no VirtualRealloc exists. My allocator
> > needs to work with Linux and Windows; for the latter one I'm already
> > managing a table of consecutive mappings in user-space that, if
> > a relocation has to be made, creates an entirely new mapping
> > into which the data of the previous mappings is copied. This is
> > redundant, because the kernel and the process keep their own copies
> > of the mapping table, and this is slow because the kernel could just
> > re-adjust the position within the address space, whereas the process
> > has to memcpy all the data from the old to the new mappings.
> >
> > Those are the very problems mremap was supposed to remove in the
> > first place. Making the limitation documented is the lazy way that
> > will force implementers to workaround it.
>
> mremap has never supported moving or growing hugetlb mappings. Someone
> (before git history) added this explicit check to the mremap code. Perhaps
> it was done when huge page support was introduced?
yes, that is the case.
> I am of the opinion that we should simply document this limitation. AFAIK,
> this this the first time anyone has asked about it in 15 years. What is the
> opinion of others?
I do not remember any such a request either. I can see some merit in the
described use case. It is not specific on why hugetlb pages are used for
the allocator memory because that comes with it own issues. If somebody
is really thrilled enough to implement this the remapping feature for
hugetlb I wouldn't be opposed as long as the implementation is clean and
wouldn't add an additional mess to the code base. I suspect that the vma
enlarging might be a hard deal. Anyway starting with a documentation
update sounds like a good thing anyway. In any case such a feature will
be available only for new kernels so people should be aware of the state
on older kernels.
--
Michal Hocko
SUSE Labs