[patch 07/71] mm: mark the correct zone as full when scanningzonelists

From: Greg KH
Date: Mon Oct 06 2008 - 20:43:05 EST


2.6.26-stable review patch. If anyone has any objections, please let us
know.

------------------
From: Mel Gorman <mel@xxxxxxxxx>

commit 5bead2a0680687b9576d57c177988e8aa082b922 upstream

The iterator for_each_zone_zonelist() uses a struct zoneref *z cursor when
scanning zonelists to keep track of where in the zonelist it is. The
zoneref that is returned corresponds to the the next zone that is to be
scanned, not the current one. It was intended to be treated as an opaque
list.

When the page allocator is scanning a zonelist, it marks elements in the
zonelist corresponding to zones that are temporarily full. As the
zonelist is being updated, it uses the cursor here;

if (NUMA_BUILD)
zlc_mark_zone_full(zonelist, z);

This is intended to prevent rescanning in the near future but the zoneref
cursor does not correspond to the zone that has been found to be full.
This is an easy misunderstanding to make so this patch corrects the
problem by changing zoneref cursor to be the current zone being scanned
instead of the next one.

Signed-off-by: Mel Gorman <mel@xxxxxxxxx>
Cc: Andy Whitcroft <apw@xxxxxxxxxxxx>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxx>

---
include/linux/mmzone.h | 12 ++++++------
mm/mmzone.c | 2 +-
2 files changed, 7 insertions(+), 7 deletions(-)

--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -751,8 +751,9 @@ static inline int zonelist_node_idx(stru
*
* This function returns the next zone at or below a given zone index that is
* within the allowed nodemask using a cursor as the starting point for the
- * search. The zoneref returned is a cursor that is used as the next starting
- * point for future calls to next_zones_zonelist().
+ * search. The zoneref returned is a cursor that represents the current zone
+ * being examined. It should be advanced by one before calling
+ * next_zones_zonelist again.
*/
struct zoneref *next_zones_zonelist(struct zoneref *z,
enum zone_type highest_zoneidx,
@@ -768,9 +769,8 @@ struct zoneref *next_zones_zonelist(stru
*
* This function returns the first zone at or below a given zone index that is
* within the allowed nodemask. The zoneref returned is a cursor that can be
- * used to iterate the zonelist with next_zones_zonelist. The cursor should
- * not be used by the caller as it does not match the value of the zone
- * returned.
+ * used to iterate the zonelist with next_zones_zonelist by advancing it by
+ * one before calling.
*/
static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
enum zone_type highest_zoneidx,
@@ -795,7 +795,7 @@ static inline struct zoneref *first_zone
#define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \
for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone); \
zone; \
- z = next_zones_zonelist(z, highidx, nodemask, &zone)) \
+ z = next_zones_zonelist(++z, highidx, nodemask, &zone)) \

/**
* for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -69,6 +69,6 @@ struct zoneref *next_zones_zonelist(stru
(z->zone && !zref_in_nodemask(z, nodes)))
z++;

- *zone = zonelist_zone(z++);
+ *zone = zonelist_zone(z);
return z;
}

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/