Re: RFC: Block reservation for hugetlbfs

From: Dave Hansen
Date: Tue Feb 21 2006 - 14:23:29 EST


On Tue, 2006-02-21 at 13:21 +1100, David Gibson wrote:
> The patch below is a draft attempt to address this problem, by
> strictly reserving a number of physical hugepages for hugepages inodes
> which have been mapped, but not instatiated. MAP_SHARED mappings are
> thus "safe" - they will fail on mmap(), not later with a SIGBUS.
> MAP_PRIVATE mappings can still SIGBUS.
...
> + read_lock_irq(&inode->i_mapping->tree_lock);
> + for (pg = inode->i_blocks; pg < npages; pg++) {
> + page = radix_tree_lookup(&mapping->page_tree, pg);
> + if (! page)
> + change_in_reserve++;
> + }
> +
> + for (pg = npages; pg < inode->i_blocks; pg++) {
> + page = radix_tree_lookup(&mapping->page_tree, pg);
> + if (! page)
> + change_in_reserve--;
> + }
> + spin_lock(&hugetlb_lock);

I'm a bit confused by this. The for loops goes through and looks for
pages that have indexes greater than the current i_blocks, but less than
the number of pages requested by this reservation. With demand
faulting, can there be holes in the file? Will this code account for
them?

I think this would also be a bit more clear if the two for loops were a
bit more explicit in what they are doing. The first is growing the
reservation when "npages > inode->i_blocks" and the second is shrinking
the reservation when "npages < inode->i_blocks", right? Is this
completely clear from the code? ;)

Also, since the operation you are performing is actually counting pages
in the radix tree at certain indexes, would it be sane to have a patch
such as the (completely untested) attached one to do just that, but in
the radix code?

-- Dave



---

memhotplug-dave/lib/radix-tree.c | 19 +++++++++++++++++--
1 files changed, 17 insertions(+), 2 deletions(-)

diff -puN lib/radix-tree.c~radix-tree-count lib/radix-tree.c
--- memhotplug/lib/radix-tree.c~radix-tree-count 2006-02-21 10:50:25.000000000 -0800
+++ memhotplug-dave/lib/radix-tree.c 2006-02-21 10:51:00.000000000 -0800
@@ -538,7 +538,9 @@ __lookup(struct radix_tree_root *root, v
for (i = index & RADIX_TREE_MAP_MASK; i < RADIX_TREE_MAP_SIZE; i++) {
index++;
if (slot->slots[i]) {
- results[nr_found++] = slot->slots[i];
+ if (results)
+ results[nr_found] = slot->slots[i];
+ nr_found++;
if (nr_found == max_items)
goto out;
}
@@ -559,6 +561,9 @@ out:
* them at *@results and returns the number of items which were placed at
* *@results.
*
+ * If *@results is NULL, the function will return a count of pages found,
+ * without any actual results.
+ *
* The implementation is naive.
*/
unsigned int
@@ -572,10 +577,13 @@ radix_tree_gang_lookup(struct radix_tree
while (ret < max_items) {
unsigned int nr_found;
unsigned long next_index; /* Index of next search */
+ void *cur_result = NULL;

if (cur_index > max_index)
break;
- nr_found = __lookup(root, results + ret, cur_index,
+ if (results)
+ cur_result = results + ret;
+ nr_found = __lookup(root, cur_result, cur_index,
max_items - ret, &next_index);
ret += nr_found;
if (next_index == 0)
@@ -586,6 +594,13 @@ radix_tree_gang_lookup(struct radix_tree
}
EXPORT_SYMBOL(radix_tree_gang_lookup);

+unsigned int
+radix_tree_lookup_count(struct radix_tree_root *root
+ unsigned long first_index)
+{
+ return radix_tree_gang_lookup(root, NULL, first_index, ~0);
+}
+
/*
* FIXME: the two tag_get()s here should use find_next_bit() instead of
* open-coding the search.
_