Re: [PATCH 11/12] vmscan: Write out dirty pages in batch

From: Mel Gorman
Date: Tue Jun 15 2010 - 11:44:36 EST


On Tue, Jun 15, 2010 at 12:43:42PM +0100, Mel Gorman wrote:
> <SNIP on whether sorting should be in page cache or block layer>
>
> > It would be interesting to code up a little test patch though, see if
> > there's benefit to be had going down this path.
> >
>
> I'll do this just to see what it looks like. To be frank, I lack taste when
> it comes to how the block layer and filesystem should behave so am having
> troube deciding if sorting the pages prior to submission is a good thing or
> if it would just encourage bad or lax behaviour in the IO submission queueing.
>

The patch to sort the list being cleaned by reclaim looks like this.
It's not actually tested

vmscan: Sort pages being queued for IO before submitting to the filesystem

While page reclaim submits dirty pages in batch, it doesn't change the
order in which the IO is issued - it is still issued in LRU order. Given
that they are issued in a short period of time now, rather than across a
longer scan period, it is likely that it will not be any faster as:

a) IO will not be started as soon, and
b) the IO scheduler still only has a small re-ordering
window and will choke just as much on random IO patterns.

This patch uses list_sort() function to sort
the list; sorting the list of pages by mapping and page->index
within the mapping would result in all the pages on each mapping
being sent down in ascending offset order at once - exactly how the
filesystems want IO to be sent to it.

Credit mostly goes to Dave Chinner for this idea and the changelog text.

----

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 68b3d22..02ab246 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -32,6 +32,7 @@
#include <linux/topology.h>
#include <linux/cpu.h>
#include <linux/cpuset.h>
+#include <linux/list_sort.h>
#include <linux/notifier.h>
#include <linux/rwsem.h>
#include <linux/delay.h>
@@ -651,6 +652,34 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
__pagevec_free(&freed_pvec);
}

+/* Sort based on mapping then index */
+static int page_writeback_cmp(void *data, struct list_head *a, struct list_head *b)
+{
+ struct page *ap = list_entry(a, struct page, lru);
+ struct page *bp = list_entry(b, struct page, lru);
+ pgoff_t diff;
+
+ /*
+ * Page not locked but it's not critical, the mapping is just for sorting
+ * If the mapping is no longer valid, it's of little consequence
+ */
+ if (ap->mapping != bp->mapping) {
+ if (ap->mapping < bp->mapping)
+ return -1;
+ if (ap->mapping > bp->mapping)
+ return 1;
+ return 0;
+ }
+
+ /* Then index */
+ diff = ap->index - bp->index;
+ if (diff < 0)
+ return -1;
+ if (diff > 0)
+ return 1;
+ return 0;
+}
+
static noinline_for_stack void clean_page_list(struct list_head *page_list,
struct scan_control *sc)
{
@@ -660,6 +689,8 @@ static noinline_for_stack void clean_page_list(struct list_head *page_list,
if (!sc->may_writepage)
return;

+ list_sort(NULL, page_list, page_writeback_cmp);
+
/* Write the pages out to disk in ranges where possible */
while (!list_empty(page_list)) {
struct address_space *mapping;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/