Re: [PATCH 0/4] hugetlbfs: optionally reserve all fs pages at mount time

From: David Rientjes
Date: Wed Mar 04 2015 - 00:50:04 EST


On Tue, 3 Mar 2015, Mike Kravetz wrote:

> hugetlbfs allocates huge pages from the global pool as needed. Even if
> the global pool contains a sufficient number pages for the filesystem
> size at mount time, those global pages could be grabbed for some other
> use. As a result, filesystem huge page allocations may fail due to lack
> of pages.
>
> Applications such as a database want to use huge pages for performance
> reasons. hugetlbfs filesystem semantics with ownership and modes work
> well to manage access to a pool of huge pages. However, the application
> would like some reasonable assurance that allocations will not fail due
> to a lack of huge pages. At application startup time, the application
> would like to configure itself to use a specific number of huge pages.
> Before starting, the application will can check to make sure that enough
> huge pages exist in the system global pools. What the application wants
> is exclusive use of a subpool of huge pages.
>
> Add a new hugetlbfs mount option 'reserved' to specify that the number
> of pages associated with the size of the filesystem will be reserved. If
> there are insufficient pages, the mount will fail. The reservation is
> maintained for the duration of the filesystem so that as pages are
> allocated and free'ed a sufficient number of pages remains reserved.
>

This functionality is somewhat limited because it's not possible to
reserve a subset of the size for a single mount point, it's either all or
nothing. It shouldn't be too difficult to just add a reserved=<value>
option where <value> is <= size. If it's done that way, you should be
able to omit size= entirely for unlimited hugepages but always ensure that
a low watermark of hugepages are reserved for the database.

> Comments from RFC addressed/incorporated
>
> Mike Kravetz (4):
> hugetlbfs: add reserved mount fields to subpool structure
> hugetlbfs: coordinate global and subpool reserve accounting
> hugetlbfs: accept subpool reserved option and setup accordingly
> hugetlbfs: document reserved mount option
>
> Documentation/vm/hugetlbpage.txt | 18 ++++++++------
> fs/hugetlbfs/inode.c | 15 ++++++++++--
> include/linux/hugetlb.h | 7 ++++++
> mm/hugetlb.c | 53 +++++++++++++++++++++++++++++++++-------
> 4 files changed, 75 insertions(+), 18 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/