Re: [PATCH 02/10] mm/hugetlb: add region_del() to delete a specific range of entries

From: Mike Kravetz
Date: Fri Jul 03 2015 - 13:23:15 EST


On 07/03/2015 12:20 AM, Hillf Danton wrote:
fallocate hole punch will want to remove a specific range of pages.
The existing region_truncate() routine deletes all region/reserve
map entries after a specified offset. region_del() will provide
this same functionality if the end of region is specified as -1.
Hence, region_del() can replace region_truncate().

Unlike region_truncate(), region_del() can return an error in the
rare case where it can not allocate memory for a region descriptor.
This ONLY happens in the case where an existing region must be split.
Current callers passing -1 as end of range will never experience
this error and do not need to deal with error handling. Future
callers of region_del() (such as fallocate hole punch) will need to
handle this error.

Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
---
mm/hugetlb.c | 101 ++++++++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 75 insertions(+), 26 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 189c0d7..e8c7f68 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -460,43 +460,92 @@ static void region_abort(struct resv_map *resv, long f, long t)
}

/*
- * Truncate the reserve map at index 'end'. Modify/truncate any
- * region which contains end. Delete any regions past end.
- * Return the number of huge pages removed from the map.
+ * Delete the specified range [f, t) from the reserve map. If the
+ * t parameter is -1, this indicates that ALL regions after f should
+ * be deleted. Locate the regions which intersect [f, t) and either
+ * trim, delete or split the existing regions.
+ *
+ * Returns the number of huge pages deleted from the reserve map.
+ * In the normal case, the return value is zero or more. In the
+ * case where a region must be split, a new region descriptor must
+ * be allocated. If the allocation fails, -ENOMEM will be returned.
+ * NOTE: If the parameter t == -1, then we will never split a region
+ * and possibly return -ENOMEM. Callers specifying t == -1 do not
+ * need to check for -ENOMEM error.
*/
-static long region_truncate(struct resv_map *resv, long end)
+static long region_del(struct resv_map *resv, long f, long t)
{
struct list_head *head = &resv->regions;
struct file_region *rg, *trg;
- long chg = 0;
+ struct file_region *nrg = NULL;
+ long del = 0;

+ if (t == -1)
+ t = LONG_MAX;

Why not use

See below

+retry:
spin_lock(&resv->lock);
- /* Locate the region we are either in or before. */
- list_for_each_entry(rg, head, link)
- if (end <= rg->to)
+ list_for_each_entry_safe(rg, trg, head, link) {
+ if (rg->to <= f)
+ continue;
+ if (rg->from >= t)
break;
- if (&rg->link == head)
- goto out;

- /* If we are in the middle of a region then adjust it. */
- if (end > rg->from) {
- chg = rg->to - end;
- rg->to = end;
- rg = list_entry(rg->link.next, typeof(*rg), link);
- }
+ if (f > rg->from && t < rg->to) { /* Must split region */
+ /*
+ * Check for an entry in the cache before dropping
+ * lock and attempting allocation.
+ */
+ if (!nrg &&
+ resv->rgn_cache_count > resv->adds_in_progress) {
+ nrg = list_first_entry(&resv->rgn_cache,
+ struct file_region,
+ link);
+ list_del(&nrg->link);
+ resv->rgn_cache_count--;
+ }

- /* Drop any remaining regions. */
- list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
- if (&rg->link == head)
+ if (!nrg) {
+ spin_unlock(&resv->lock);
+ nrg = kmalloc(sizeof(*nrg), GFP_KERNEL);
+ if (!nrg)
+ return -ENOMEM;
+ goto retry;
+ }
+
+ del += t - f;
+
+ /* New entry for end of split region */
+ nrg->from = t;
+ nrg->to = rg->to;
+ INIT_LIST_HEAD(&nrg->link);
+
+ /* Original entry is trimmed */
+ rg->to = f;
+
+ list_add(&nrg->link, &rg->link);
+ nrg = NULL;
break;
- chg += rg->to - rg->from;
- list_del(&rg->link);
- kfree(rg);
+ }
+
+ if (f <= rg->from && t >= rg->to) { /* Remove entire region */
+ del += rg->to - rg->from;
+ list_del(&rg->link);
+ kfree(rg);
+ continue;
+ }
+
+ if (f <= rg->from) { /* Trim beginning of region */
+ del += t - rg->from;
+ rg->from = t;
+ } else { /* Trim end of region */
+ del += rg->to - f;
+ rg->to = f;
+ }
}

-out:
spin_unlock(&resv->lock);
- return chg;
+ kfree(nrg);
+ return del;
}

/*
@@ -647,7 +696,7 @@ void resv_map_release(struct kref *ref)
struct file_region *rg, *trg;

/* Clear out any active regions before we release the map. */
- region_truncate(resv_map, 0);
+ region_del(resv_map, 0, -1);

LONG_MAX is not selected, why?

I think your question is "Why not use LONG_MAX to indicate all
remaining regions in the map should be deleted (a truncate
operation)?".

The two values passed to the routine are essentially huge page
offsets. My first thought when creating the routine is that a
negative value (such as -1) is an invalid offset and a good
choice to indicate a "special" case operation. LONG_MAX is also
an invalid huge page offset, so it could be used for the same
purpose and avoid the "if (t == -1) t = LONG_MAX" at the
beginning of the routine. I have no objections in changing this
to LONG_MAX is if makes the code simpler/easier to understand.

--
Mike Kravetz


/* ... and any entries left in the cache */
list_for_each_entry_safe(rg, trg, head, link)
@@ -3868,7 +3917,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed)
long gbl_reserve;

if (resv_map)
- chg = region_truncate(resv_map, offset);
+ chg = region_del(resv_map, offset, -1);
spin_lock(&inode->i_lock);
inode->i_blocks -= (blocks_per_huge_page(h) * freed);
spin_unlock(&inode->i_lock);
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/