Re: [PATCH RFC] hugetlbfs 'noautofill' mount option

From: Prakash Sangappa
Date: Tue May 02 2017 - 12:07:11 EST




On 5/2/17 3:53 AM, Anshuman Khandual wrote:
On 05/01/2017 11:30 PM, Prakash Sangappa wrote:
Some applications like a database use hugetblfs for performance
reasons. Files on hugetlbfs filesystem are created and huge pages
allocated using fallocate() API. Pages are deallocated/freed using
fallocate() hole punching support that has been added to hugetlbfs.
These files are mmapped and accessed by many processes as shared memory.
Such applications keep track of which offsets in the hugetlbfs file have
pages allocated.

Any access to mapped address over holes in the file, which can occur due
s/mapped/unmapped/ ^ ?

It is 'mapped' address.


to bugs in the application, is considered invalid and expect the process
to simply receive a SIGBUS. However, currently when a hole in the file is
accessed via the mapped address, kernel/mm attempts to automatically
allocate a page at page fault time, resulting in implicitly filling the
hole
But this is expected when you try to control the file allocation from
a mapped address. Any changes while walking past or writing the range
in the memory mapped should reflect exactly in the file on the disk.
Why its not a valid behavior ?
Sure, that is a valid behavior. However, hugetlbfs is a pesudo filesystem
and the purpose is for applications to use hugepage memory. The contents
of these filesystem are not backed by disk nor are they swapped out.

The proposed new behavior is only applicable for hugetlbfs filesystem mounted
with the new 'noautofill' mount option. The file's page allocation/free are managed
using the 'fallocate()' API.

For hugetlbfs filesystems mounted without this option, there is no change in behavior.

in the file. This may not be the desired behavior for applications like the
database that want to explicitly manage page allocations of hugetlbfs
files.

This patch adds a new hugetlbfs mount option 'noautofill', to indicate that
pages should not be allocated at page fault time when accessed thru mmapped
address.
When the page should be allocated for mapping ?
The application would allocate/free file pages using the fallocate() API.