Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL

From: C.Wehrmeyer
Date: Mon Oct 23 2017 - 12:47:47 EST


On 23-10-17 18:13, Michal Hocko wrote:
On Mon 23-10-17 16:00:13, C.Wehrmeyer wrote:
And just to be very sure I've added:

if (madvise(buf1,ALLOC_SIZE_1,MADV_HUGEPAGE)) {
errno_tmp = errno;
fprintf(stderr,"madvise: %u\n",errno_tmp);
goto out;
}

/*Make sure the mapping is actually used*/
memset(buf1,'!',ALLOC_SIZE_1);

Is the buffer aligned to 2MB?

When I omit MAP_HUGETLB for the flags that mmap receives - no.

#define ALLOC_SIZE_1 (2 * 1024 * 1024)
[...]
buf1 = mmap (
NULL,
ALLOC_SIZE_1,
prot, /*PROT_READ | PROT_WRITE*/
flags /*MAP_PRIVATE | MAP_ANONYMOUS*/,
-1,
0
);

In such a case buf1 usually contains addresses which are aligned to 4 KiBs, such as 0x7f07d76e9000. 2-MiB-aligned addresses, such as 0x7f89f5e00000, are only produced with MAP_HUGETLB - which, if I understood the documentation correctly, is not the point of THPs as they are supposed to be transparent.

I'm not exactly sure how I'm supposed to force mmap to give me any other kind of address, if that is going to be your suggestion - unless I'd read the mapping configuration for the current process and find myself a spot where I can tell mmap to create a mapping for me using MAP_FIXED. But that wouldn't be transparent, either.

/*Give me time for monitoring*/
sleep(2000);

right after the mmap call. I've also made sure that nothing is being
optimised away by the compiler. With a 2MiB mapping being requested this
should be a good opportunity for the kernel, and yet when I try to figure
out how many THPs my processes uses:

$ cat /proc/21986/smaps | grep 'AnonHuge'

I just end up with lots of:

AnonHugePages: 0 kB

And cat /proc/meminfo | grep 'Huge' doesn't change significantly as well. Am
I just doing something wrong here, or shouldn't I trust the THP mechanisms
to actually allocate hugepages for me?

If the mapping is aligned properly then the rest is up to system and
availability of large physically contiguous memory blocks.

I have about 5 GiBs of free memory right now, and while I can not guarantee that memory fragmentation prevents the kernel from using THP, manually reserving 256 2-MiB pages through nr_hugepages and then freeing them works just fine. Yes, after allocating them I checked if nr_hugepages actually was 256. And yet, after immediately running my program, there would be no change any of the AnonHugePages elements that smaps exports. Also (while omitting MAP_HUGETLB) buf1 remains to be aligned to 4 KiB.