[PATCH 1/2] lumpy reclaim: clean up and write lumpy reclaim

From: KAMEZAWA Hiroyuki
Date: Wed Jun 10 2009 - 01:26:29 EST

Next message: Andrew Morton: "Re: [PATCH 1/3] Reintroduce zone_reclaim_interval for whenzone_reclaim() scans and fails to avoid CPU spinning at 100% on NUMA"
Previous message: Stephen Rothwell: "linux-next: ttydev tree build warning"
Next in thread: KAMEZAWA Hiroyuki: "[PATCH 2/2] memcg: fix LRU rotation at __isolate_page"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I think lumpy reclaim should be updated to meet to current split-lru.
This patch includes bugfix and cleanup. How do you think ?

==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>

In lumpty reclaim, "cursor_page" is found just by pfn. Then, we don't know
where "cursor" page came from. Then, putback it to "src" list is BUG.
And as pointed out, current lumpy reclaim doens't seem to
work as originally designed and a bit complicated. This patch adds a
function try_lumpy_reclaim() and rewrite the logic.

The major changes from current lumpy reclaim is
- check migratetype before aggressive retry at failure.
- check PG_unevictable at failure.
- scan is done in buddy system order. This is a help for creating
a lump around targeted page. We'll create a continuous pages for buddy
allocator as far as we can _around_ reclaim target page.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
---
mm/vmscan.c | 120 +++++++++++++++++++++++++++++++++++-------------------------
1 file changed, 71 insertions(+), 49 deletions(-)

Index: mmotm-2.6.30-Jun10/mm/vmscan.c
===================================================================
--- mmotm-2.6.30-Jun10.orig/mm/vmscan.c
+++ mmotm-2.6.30-Jun10/mm/vmscan.c
@@ -850,6 +850,69 @@ int __isolate_lru_page(struct page *page
return ret;
}

+static int
+try_lumpy_reclaim(struct page *page, struct list_head *dst, int request_order)
+{
+ unsigned long buddy_base, buddy_idx, buddy_start_pfn, buddy_end_pfn;
+ unsigned long pfn, page_pfn, page_idx;
+ int zone_id, order, type;
+ int do_aggressive = 0;
+ int nr = 0;
+ /*
+ * Lumpy reqraim. Try to take near pages in requested order to
+ * create free continous pages. This algorithm tries to start
+ * from order 0 and scan buddy pages up to request_order.
+ * If you are unsure about buddy position calclation, please see
+ * mm/page_alloc.c
+ */
+ zone_id = page_zone_id(page);
+ page_pfn = page_to_pfn(page);
+ buddy_base = page_pfn & ~((1 << MAX_ORDER) - 1);
+
+ /* Can we expect succesful reclaim ? */
+ type = get_pageblock_migratetype(page);
+ if ((type == MIGRATE_MOVABLE) || (type == MIGRATE_RECLAIMABLE))
+ do_aggressive = 1;
+
+ for (order = 0; order < request_order; ++order) {
+ /* offset in this buddy region */
+ page_idx = page_pfn & ~buddy_base;
+ /* offset of buddy can be calculated by xor */
+ buddy_idx = page_idx ^ (1 << order);
+ buddy_start_pfn = buddy_base + buddy_idx;
+ buddy_end_pfn = buddy_start_pfn + (1 << order);
+
+ /* scan range [buddy_start_pfn...buddy_end_pfn) */
+ for (pfn = buddy_start_pfn; pfn < buddy_end_pfn; ++pfn) {
+ /* Avoid holes within the zone. */
+ if (unlikely(!pfn_valid_within(pfn)))
+ break;
+ page = pfn_to_page(pfn);
+ /*
+ * Check that we have not crossed a zone boundary.
+ * Some arch have zones not aligned to MAX_ORDER.
+ */
+ if (unlikely(page_zone_id(page) != zone_id))
+ break;
+
+ /* we are always under ISOLATE_BOTH */
+ if (__isolate_lru_page(page, ISOLATE_BOTH, 0) == 0) {
+ list_move(&page->lru, dst);
+ nr++;
+ } else if (do_aggressive && !PageUnevictable(page))
+ continue;
+ else
+ break;
+ }
+ /* we can't refill this order */
+ if (pfn != buddy_end_pfn)
+ break;
+ if (buddy_start_pfn < page_pfn)
+ page_pfn = buddy_start_pfn;
+ }
+ return nr;
+}
+
/*
* zone->lru_lock is heavily contended. Some of the functions that
* shrink the lists perform better by taking out a batch of pages
@@ -875,14 +938,10 @@ static unsigned long isolate_lru_pages(u
unsigned long *scanned, int order, int mode, int file)
{
unsigned long nr_taken = 0;
- unsigned long scan;
+ unsigned long scan, nr;

for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
struct page *page;
- unsigned long pfn;
- unsigned long end_pfn;
- unsigned long page_pfn;
- int zone_id;

page = lru_to_page(src);
prefetchw_prev_lru_page(page, src, flags);
@@ -903,52 +962,15 @@ static unsigned long isolate_lru_pages(u
default:
BUG();
}
-
- if (!order)
- continue;
-
/*
- * Attempt to take all pages in the order aligned region
- * surrounding the tag page. Only take those pages of
- * the same active state as that tag page. We may safely
- * round the target page pfn down to the requested order
- * as the mem_map is guarenteed valid out to MAX_ORDER,
- * where that page is in a different zone we will detect
- * it from its zone id and abort this block scan.
+ * Lumpy reclaim tries to free nearby pages regardless of
+ * their lru attributes(file, active, etc..)
*/
- zone_id = page_zone_id(page);
- page_pfn = page_to_pfn(page);
- pfn = page_pfn & ~((1 << order) - 1);
- end_pfn = pfn + (1 << order);
- for (; pfn < end_pfn; pfn++) {
- struct page *cursor_page;
-
- /* The target page is in the block, ignore it. */
- if (unlikely(pfn == page_pfn))
- continue;
-
- /* Avoid holes within the zone. */
- if (unlikely(!pfn_valid_within(pfn)))
- break;
-
- cursor_page = pfn_to_page(pfn);
-
- /* Check that we have not crossed a zone boundary. */
- if (unlikely(page_zone_id(cursor_page) != zone_id))
- continue;
- switch (__isolate_lru_page(cursor_page, mode, file)) {
- case 0:
- list_move(&cursor_page->lru, dst);
- nr_taken++;
- scan++;
- break;
-
- case -EBUSY:
- /* else it is being freed elsewhere */
- list_move(&cursor_page->lru, src);
- default:
- break; /* ! on LRU or wrong list */
- }
+ if (order && mode == ISOLATE_BOTH) {
+ /* try to reclaim pages nearby this */
+ nr = try_lumpy_reclaim(page, dst, order);
+ nr_taken += nr;
+ scan += nr;
}
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "Re: [PATCH 1/3] Reintroduce zone_reclaim_interval for whenzone_reclaim() scans and fails to avoid CPU spinning at 100% on NUMA"
Previous message: Stephen Rothwell: "linux-next: ttydev tree build warning"
Next in thread: KAMEZAWA Hiroyuki: "[PATCH 2/2] memcg: fix LRU rotation at __isolate_page"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]