Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned

From: David Rientjes
Date: Thu Mar 26 2015 - 16:03:21 EST


On Thu, 26 Mar 2015, Davide Libenzi wrote:

> > Yes, this munmap() behavior of lengths <= hugepage_size - PAGE_SIZE for a
> > hugetlb vma is long standing and there may be applications that break as a
> > result of changing the behavior: a database that reserves all allocated
> > hugetlb memory with mmap() so that it always has exclusive access to those
> > hugepages, whether they are faulted or not, and maintains its own hugepage
> > pool (which is common), may test the return value of munmap() and depend
> > on it returning -EINVAL to determine if it is freeing memory that was
> > either dynamically allocated or mapped from the hugetlb reserved pool.
>
> You went a long way to create such a case.
> But, in your case, that application will erroneously considering hugepage
> mmaped memory, as dynamically allocated, since it will always get EINVAL,
> unless it passes an aligned size. Aligned size, which a fix like the one
> posted in the patch will still leave as success.

There was a patch proposed last week to add reserved pools to the
hugetlbfs mount option specifically for the case where a large database
wants sole reserved access to the hugepage pool. This is why hugetlbfs
pages become reserved on mmap(). In that case, the database never wants
to do munmap() and instead maintains its own hugepage pool.

That makes the usual database case, mmap() all necessary hugetlb pages to
reserve them, even easier since they have historically had to maintain
this pool amongst various processes.

Is there a process out there that tests for munmap(ptr) == EINVAL and, if
true, returns ptr to its hugepage pool? I can't say for certain that none
exist, that's why the potential for breakage exists.

> OTOH, an application, which might be more common than the one you posted,
> which calls munmap() to release a pointer which it validly got from a
> previous mmap(), will leak huge pages as all the issued munmaps will fail.
>

That application would have to be ignoring an EINVAL return value.

> > If we were to go back in time and decide this when the munmap() behavior
> > for hugetlb vmas was originally introduced, that would be valid. The
> > problem is that it could lead to userspace breakage and that's a
> > non-starter.
> >
> > What we can do is improve the documentation and man-page to clearly
> > specify the long-standing behavior so that nobody encounters unexpected
> > results in the future.
>
> This way you will leave the mmap API with broken semantics.
> In any case, I am done arguing.
> I will leave to Andrew to sort it out, and to Michael Kerrisk to update
> the mmap man pages with the new funny behaviour.
>

The behavior is certainly not new, it has always been the case for
munmap() on hugetlb vmas.

In a strict POSIX interpretation, it refers only to pages in the sense of
what is returned by sysconf(_SC_PAGESIZE). Such vmas are not backed by
any pages of size sysconf(_SC_PAGESIZE), so this behavior is undefined.
It would be best to modify the man page to explicitly state this for
MAP_HUGETLB.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/