Re: ZONE_PADDING wastes 4 bytes of the new cacheline

From: Andrea Arcangeli
Date: Mon Oct 25 2004 - 07:48:17 EST


The more I read the old protection setting the more it seems broken. It
even looks it would buffer overflow if it wasn't by luck that
GFP_ZONETYPES == GFP_ZONEMASK, and I suspect the Non-loner version of
GFP_ZONETYPES would trigger it. I don't understand what loner means.
online dict returns:

loner
n : a person who avoids the company or assistance of others

nosense for non-english speaker in GFP_ZONETYPES context:

/* #define GFP_ZONETYPES (GFP_ZONEMASK + 1) */ /* Non-loner */
#define GFP_ZONETYPES ((GFP_ZONEMASK + 1) / 2 + 1) /* Loner */


anyways GFP_ZONETYPES has absolutely nothing to do with this code, no
idea how it splipped into it, this is not a zonelist, this is a
node_zones array only in function of MAX_NR_ZONES and it makes no sense
to me to use anything but MAX_NR_ZONES to initialialize it.

This is my forward port from 2.4 that should fix it.

static void setup_per_zone_lowmem_reserve(void)
{
struct pglist_data *pgdat;
int j, idx;

for_each_pgdat(pgdat) {
for (j = 0; j < MAX_NR_ZONES; j++) {
zone_t *zone = pgdat->node_zones + j;
unsigned long presentpages = zone->presentpages;

zone->lowmem_reserve[j] = 0;

for (idx = j-1; idx >= 0; idx--) {
zone_t * lower_zone = pgdat->node_zones + idx;

lower_zone->lowmem_reserve[j] = presetpages / lower_zone_reserve_ratio[idx];
presentpages += lower_zone->presentpages;
}
}
}
}

here a diff to compare it with current 2.6 code.

diff -urNp --exclude CVS --exclude BitKeeper --exclude {arch} --exclude .arch-ids -u sles-ref/mm/page_alloc.c sles/mm/page_alloc.c
--- sles-ref/mm/page_alloc.c 2004-10-25 13:50:37.287995072 +0200
+++ sles/mm/page_alloc.c 2004-10-25 14:37:23.358407528 +0200
@@ -1905,87 +1892,29 @@ void __init page_alloc_init(void)
hotcpu_notifier(page_alloc_cpu_notify, 0);
}

-static unsigned long higherzone_val(struct zone *z, int max_zone,
- int alloc_type)
-{
- int z_idx = zone_idx(z);
- struct zone *higherzone;
- unsigned long pages;
-
- /* there is no higher zone to get a contribution from */
- if (z_idx == MAX_NR_ZONES-1)
- return 0;
-
- higherzone = &z->zone_pgdat->node_zones[z_idx+1];
-
- /* We always start with the higher zone's protection value */
- pages = higherzone->protection[alloc_type];
-
- /*
- * We get a lower-zone-protection contribution only if there are
- * pages in the higher zone and if we're not the highest zone
- * in the current zonelist. e.g., never happens for GFP_DMA. Happens
- * only for ZONE_DMA in a GFP_KERNEL allocation and happens for ZONE_DMA
- * and ZONE_NORMAL for a GFP_HIGHMEM allocation.
- */
- if (higherzone->present_pages && z_idx < alloc_type)
- pages += higherzone->pages_low * sysctl_lower_zone_protection;
-
- return pages;
-}
-
/*
- * setup_per_zone_protection - called whenver min_free_kbytes or
- * sysctl_lower_zone_protection changes. Ensures that each zone
- * has a correct pages_protected value, so an adequate number of
+ * setup_per_zone_lowmem_reserve - called whenever
+ * sysctl_lower_zone_reserve_ratio changes. Ensures that each zone
+ * has a correct pages reserved value, so an adequate number of
* pages are left in the zone after a successful __alloc_pages().
- *
- * This algorithm is way confusing. I tries to keep the same behavior
- * as we had with the incremental min iterative algorithm.
*/
-static void setup_per_zone_protection(void)
+static void setup_per_zone_lowmem_reserve(void)
{
struct pglist_data *pgdat;
- struct zone *zones, *zone;
- int max_zone;
- int i, j;
+ int j, idx;

for_each_pgdat(pgdat) {
- zones = pgdat->node_zones;
+ for (j = 0; j < MAX_NR_ZONES; j++) {
+ zone_t *zone = pgdat->node_zones + j;
+ unsigned long presentpages = zone->presentpages;

- for (i = 0, max_zone = 0; i < MAX_NR_ZONES; i++)
- if (zones[i].present_pages)
- max_zone = i;
+ zone->lowmem_reserve[j] = 0;

- /*
- * For each of the different allocation types:
- * GFP_DMA -> GFP_KERNEL -> GFP_HIGHMEM
- */
- for (i = 0; i < GFP_ZONETYPES; i++) {
- /*
- * For each of the zones:
- * ZONE_HIGHMEM -> ZONE_NORMAL -> ZONE_DMA
- */
- for (j = MAX_NR_ZONES-1; j >= 0; j--) {
- zone = &zones[j];
+ for (idx = j-1; idx >= 0; idx--) {
+ zone_t * lower_zone = pgdat->node_zones + idx;

- /*
- * We never protect zones that don't have memory
- * in them (j>max_zone) or zones that aren't in
- * the zonelists for a certain type of
- * allocation (j>=i). We have to assign these
- * to zero because the lower zones take
- * contributions from the higher zones.
- */
- if (j > max_zone || j >= i) {
- zone->protection[i] = 0;
- continue;
- }
- /*
- * The contribution of the next higher zone
- */
- zone->protection[i] = higherzone_val(zone,
- max_zone, i);
+ lower_zone->lowmem_reserve[j] = presetpages / lower_zone_reserve_ratio[idx];
+ presentpages += lower_zone->presentpages;
}
}
}

quite some garbage got deleted:

andrea@dualathlon:~/devel/kernel> diffstat /tmp/z
page_alloc.c | 84 +++++++++--------------------------------------------------
1 files changed, 13 insertions(+), 71 deletions(-)
andrea@dualathlon:~/devel/kernel>

I'll adapt the rest of the code to deal with this and test it in a few
more minutes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/