Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned

From: David Rientjes
Date: Thu Mar 26 2015 - 15:15:31 EST


On Thu, 26 Mar 2015, Davide Libenzi wrote:

> > I looked at this thread at http://marc.info/?t=141392508800001 since I
> > didn't have it in my mailbox, and I didn't get a chance to actually run
> > your test code.
> >
> > In short, I think what you're saying is that
> >
> > ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
> > munmap(ptr, 4KB) == EINVAL
>
> I am not sure you have read the email correctly:
>
> munmap(mmap(size, HUGETLB), size) = EFAIL
>
> For every size not multiple of the huge page size.
> Whereas:
>
> munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK
>

Yes, I read it correctly, and wrote how your test case should have failed
above. It fails when you do the 4KB mmap() with MAP_HUGETLB and pass 4KB
to munmap(), correct?

I have no idea what EFAIL is, though.

> > The question you pose is whether munmap(ptr, 4KB) should succeed for a
> > hugetlb vma and in your patch you align this to the hugepage size of the
> > vma in the same manner that munmap(ptr, 2KB) would be aligned to PAGE_SIZE
> > for a non-hugetlb vma.
> >
> > The munmap() spec says the whole pages that include any part of the passed
> > length should be unmapped. In spirit, I would agree with you that the
> > page size for the vma is the hugepage size so that would be what would be
> > unmapped.
> >
> > But that's going by a spec that doesn't address hugepages and is worded in
> > a way that {PAGE_SIZE} is the base unit that both mmap() and munmap() is
> > done. It carries no notion of variable page sizes and how hugepages
> > should be handled with respect to pages of {PAGE_SIZE} length. So I think
> > this is beyond the scope of the spec: any length is aligned to PAGE_SIZE,
> > but the munmap() behavior for hugetlb vmas is not restricted.
> >
> > It would seem too dangerous at this point to change the behavior of
> > munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs could actually
> > arise from aligning to the hugepage size.
>
> You mean, there is an harder failure than the current failure? :)
>

Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a
hugetlb vma is long standing and there may be applications that break as a
result of changing the behavior: a database that reserves all allocated
hugetlb memory with mmap() so that it always has exclusive access to those
hugepages, whether they are faulted or not, and maintains its own hugepage
pool (which is common), may test the return value of munmap() and depend
on it returning -EINVAL to determine if it is freeing memory that was
either dynamically allocated or mapped from the hugetlb reserved pool.

> > Some applications purposefully reserve hugetlb pages by mmap() and never
> > munmap() them so they have exclusive access to hugepages that were
> > allocated either at boot or runtime by the sysadmin. If they depend on
> > the return value of munmap() to determine if memory to free is memory
> > dynamically allocated by the application or reserved as hugetlb memory,
> > then this would cause them to break. I can't say for certain that no such
> > application exists.
>
> The fact that certain applications will seldomly call an API, should be no
> reason for API to have bugs, or at the very least, a bahviour which not
> only in not documented in the man pages, but also totally unrespectful of
> the normal mmap/munmap semantics.

If we were to go back in time and decide this when the munmap() behavior
for hugetlb vmas was originally introduced, that would be valid. The
problem is that it could lead to userspace breakage and that's a
non-starter.

What we can do is improve the documentation and man-page to clearly
specify the long-standing behavior so that nobody encounters unexpected
results in the future.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/