Re: [PATCH v2] mm, hugepages: add mremap() support for hugepage backed vma

From: Mina Almasry
Date: Wed Oct 06 2021 - 15:49:40 EST


My sincere apologies for the late follow up here. I've been out for
sometime and needed some time to catch up.

On Tue, Aug 24, 2021 at 8:33 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> On 8/19/21 01:35, Mina Almasry wrote:
> > On Fri, Aug 13, 2021 at 4:40 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
> >>
> >> Adding Michal, Vlastimil and Kirill as adding this type of support
> >> was discussed in this thread:
> >> https://lore.kernel.org/linux-mm/3ba05809-63a2-2969-e54f-fd0202fe336b@xxxxxx/
> >>
> >
> > Ack. I haven't gone through that thread but if there are any concerns
> > with what I whipped up here, do let me know.
>
> Looks like there were mostly concerns with vma's growing and you're not handling
> that, so that's fine?
>

Yes, it seems that's fine.

> >> On 8/10/21 6:10 PM, Mina Almasry wrote:
> >> > Support mremap() for hugepage backed vma segment by simply repositioning
> >> > page table entries. The page table entries are repositioned to the new
> >> > virtual address on mremap().
> >> >
> >> > Hugetlb mremap() support is of course generic; my motivating use case
> >> > is a library (hugepage_text), which reloads the ELF text of executables
> >> > in hugepages. This significantly increases the execution performance of
> >> > said executables.
>
> I think this scenario has been motivation for FS support of THP, but AFAIK we're
> not there yet. Also IIRC Google had some library that did what you describe, but
> using THP's instead of hugetlbfs - might be simpler as there are no reservations
> involved.
>
> But nothing against improving mremap() support wrt hugetlbfs.
>
> > I'll yield to whatever you decide here because I reckon you have much
> > more experience and better judgement here. But my thoughts:
> >
> > 'Sane' usage of mremap() is something like:
> > 1. mmap() a hugetlbfs vma.
> > 2. Pass the vma received from step (1) to mremap() to remap it to a
> > different location.
> >
> > I don't know if there is another usage pattern I need to worry about
> > but given the above, old_addr and old_len will be hugepage aligned
> > already since they are values returned by the previous mmap() call
> > which aligns them, no? So, I think aligning old_addr and old_len to
> > the hugepage boundary is fine.
> >
> > With this support we don't allow mremap() expansion. In my use case
> > old_len==new_len acutally. I think it's fine to also align new_len to
> > the hugepage boundary
> >
> > I already have this code that errors out if the lengths are not aligned:
> >
> > if (old_len & ~huge_page_mask(h) || new_len & ~huge_page_mask(h))
> > goto out;
> >
> > I think aligning new_addr breaks my use case though. In my use case
> > new_addr is the start of the text segment in the ELF executable, and I
> > don't think that's guaranteed to be anything but page aligned.
>
> Hm, I have a vague (possibly wrong) recollection that Andrea mentioned he always
> planned text to be in THPs so he made sure ELF text sections are aligned as
> such. I guess there's a way to declare alignment in ELF and it depends on
> whether your distro's linker is set up to ask for hugepage-sized one?
>

Sincere apologies to Vlastimil and Mike for my misinformation here as
well. You are completely correct in that my v3 patch return EINVAL if
the new_addr was not already hugepage aligned. It turns out
unbeknownst to me our userspace code was already handling that and
making sure only to remap to an address that was already hugepage
aligned. So, for my use case, either erroring out if new_addr is not
hugepage aligned or aligning new_addr works fine for me. Mike has
indicated he prefers the kernel aligning new_addr for consistency with
how mmap() works, so I've uploaded v4 with that change. I've also
updated the tests to test that case.

As far as I can tell that was the only pending issue with the previous
reviews. Please take another look. Thanks!