Re: [discuss] Re: 32-bit dma allocations on 64-bit platforms

From: Andrea Arcangeli
Date: Thu Jun 24 2004 - 12:43:09 EST


On Fri, Jun 25, 2004 at 01:48:47AM +1000, Nick Piggin wrote:
> 2.6 has the "incremental min" thing. What is wrong with that?
> Though I think it is turned off by default.

I looked more into it and you can leave it turned off since it's not
going to work.

it's all in functions of z->pages_* and those are _global_ for all the
zones, and in turn they're absolutely meaningless.

the algorithm has nothing in common with lowmem_reverse_ratio, the
effect has a tinybit of similarity but the incremntal min thing is so
weak and so bad that it will either not help or it'll waste tons of
memory. Furthemore you cannot set a sysctl value that works for all
machines. The whole thing should be dropped and replaced with the fine
production quality lowmem_reserve_ratio in 2.4.26+

(the only broken thing of lowmem_reserve_ratio is that it cannot be
tuned, not even at boottime, a recompile is needed, but that's fixable
to tune it at boot time, and in theory at runtime too, but the point is
that no dyanmic tuning is required with it)


Please focus on this code of 2.4:

/*
* We don't know if the memory that we're going to allocate will
* be freeable or/and it will be released eventually, so to
* avoid totally wasting several GB of ram we must reserve some
* of the lower zone memory (otherwise we risk to run OOM on the
* lower zones despite there's tons of freeable ram on the
* higher zones).
*/
zone_watermarks_t watermarks[MAX_NR_ZONES];

typedef struct zone_watermarks_s {
unsigned long min, low, high;
} zone_watermarks_t;

class_idx = zone_idx(classzone);

for (;;) {
zone_t *z = *(zone++);
if (!z)
break;

if (zone_free_pages(z, order) >
z->watermarks[class_idx].low) {
page = rmqueue(z, order);
if (page)
return page;
}
}


zone->watermarks[j].min = mask;
zone->watermarks[j].low = mask*2;
zone->watermarks[j].high = mask*3;
/* now set the watermarks of the lower zones in the "j"
* classzone */
for (idx = j-1; idx >= 0; idx--) {
zone_t * lower_zone = pgdat->node_zones + idx;
unsigned long lower_zone_reserve;
if (!lower_zone->size)
continue;

mask = lower_zone->watermarks[idx].min;
lower_zone->watermarks[j].min = mask;
lower_zone->watermarks[j].low = mask*2;
lower_zone->watermarks[j].high = mask*3;

/* now the brainer part */
lower_zone_reserve = realsize /
lower_zone_reserve_ratio[idx];
lower_zone->watermarks[j].min +=
lower_zone_reserve;
lower_zone->watermarks[j].low +=
lower_zone_reserve;
lower_zone->watermarks[j].high +=
lower_zone_reserve;

realsize += lower_zone->realsize;
}


The 2.6 algorithm controlled by the sysctl does nothing similar to the
above.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/