Re: [Bug 8464] New: autoreconf: page allocation failure. order:2, mode:0x84020

From: Mel Gorman
Date: Fri May 11 2007 - 13:38:24 EST


On (11/05/07 13:51), Nicolas Mailhot didst pronounce:
> Le vendredi 11 mai 2007 à 10:08 +0100, Mel Gorman a écrit :
>
> > > seems to have cured the system so far (need to charge it a bit longer to
> > > be sure)
> > >
> >
> > The longer it runs the better, particularly under load and after
> > updatedb has run. Thanks a lot for testing
>
> After a few hours of load testing still nothing in the logs, so the
> revert was probably the right thing to do

Excellent. I am somewhat suprised by the result so I'd like to look at the
alternative option with kswapd as well. Could you put that patch back in again
please and try the following patch instead? The patch causes kswapd to reclaim
at higher orders if it's requested to. Christoph, can you look at the patch
as well and make sure it's doing the right thing with respect to SLUB please?

Ultimatly, it's probably still a bad plan for atomic allocations to
depend on high-order allocations being possible but it's interesting to
see how it behaves.

Thanks

=====

Subject: [RFC] Have kswapd keep a minimum order free other than order-0

kswapd normally reclaims at order 0 unless there is a higher-order allocation
under way. However, in some cases, it is known that there are a minimum
order size of general interest such as the SLUB allocator requiring regular
high-order allocations. This allows a minimum order to be set to that
min_free_kbytes is kept at higher orders.

With a simple stress test, buddyinfo looks like this at the end with kswapd at ddefault;

Node 0, zone DMA 10 12 11 10 10 10 6 5 4 1 0
Node 0, zone Normal 87 232 601 490 369 282 197 116 79 39 28

With kswapd attempting to keep pages free at order-4, it looks like

Node 0, zone DMA 35 37 29 28 22 18 8 6 0 1 0
Node 0, zone Normal 96 203 361 355 265 203 141 97 57 34 48

---
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/include/linux/mmzone.h linux-2.6.21-mm2-kswapd_minorder/include/linux/mmzone.h
--- linux-2.6.21-mm2-clean/include/linux/mmzone.h 2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-kswapd_minorder/include/linux/mmzone.h 2007-05-11 11:12:43.000000000 +0100
@@ -499,6 +499,8 @@ typedef struct pglist_data {
void get_zone_counts(unsigned long *active, unsigned long *inactive,
unsigned long *free);
void build_all_zonelists(void);
+int kswapd_order(unsigned int order);
+void set_kswapd_order(unsigned int order);
void wakeup_kswapd(struct zone *zone, int order);
int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
int classzone_idx, int alloc_flags);
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/slub.c linux-2.6.21-mm2-kswapd_minorder/mm/slub.c
--- linux-2.6.21-mm2-clean/mm/slub.c 2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-kswapd_minorder/mm/slub.c 2007-05-11 11:10:08.000000000 +0100
@@ -2131,6 +2131,7 @@ static struct kmem_cache *kmalloc_caches
static int __init setup_slub_min_order(char *str)
{
get_option (&str, &slub_min_order);
+ set_kswapd_order(slub_min_order);
user_override = 1;
return 1;
}
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/vmscan.c linux-2.6.21-mm2-kswapd_minorder/mm/vmscan.c
--- linux-2.6.21-mm2-clean/mm/vmscan.c 2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-kswapd_minorder/mm/vmscan.c 2007-05-11 11:55:45.000000000 +0100
@@ -1407,6 +1407,32 @@ out:
return nr_reclaimed;
}

+static unsigned int kswapd_min_order __read_mostly;
+
+/**
+ * set_kswapd_order - Set the minimum order that kswapd reclaims at
+ * @order: The new minimum order
+ *
+ * kswapd normally reclaims at order 0 unless there is a higher-order
+ * allocation under way. However, in some cases, it is known that there
+ * are a minimum order size of general interest such as the SLUB allocator
+ * requiring regular high-order allocations. This allows a minimum order
+ * to be set to that min_free_kbytes is kept at higher orders
+ */
+void set_kswapd_order(unsigned int order)
+{
+ if (order >= MAX_ORDER)
+ return;
+
+ printk(KERN_INFO "kswapd reclaim order set to %d\n", order);
+ kswapd_min_order = order;
+}
+
+int kswapd_order(unsigned int order)
+{
+ return max(kswapd_min_order, order);
+}
+
/*
* The background pageout daemon, started as a kernel thread
* from the init process.
@@ -1450,13 +1476,13 @@ static int kswapd(void *p)
*/
tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;

- order = 0;
+ order = kswapd_order(0);
for ( ; ; ) {
unsigned long new_order;

prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
- new_order = pgdat->kswapd_max_order;
- pgdat->kswapd_max_order = 0;
+ new_order = kswapd_order(pgdat->kswapd_max_order);
+ pgdat->kswapd_max_order = kswapd_order(0);
if (order < new_order) {
/*
* Don't sleep if someone wants a larger 'order'
@@ -1467,7 +1493,7 @@ static int kswapd(void *p)
if (!freezing(current))
schedule();

- order = pgdat->kswapd_max_order;
+ order = kswapd_order(pgdat->kswapd_max_order);
}
finish_wait(&pgdat->kswapd_wait, &wait);

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/