Re: [PATCH 0/8] Create ZONE_MOVABLE to partition memory betweenmovable and non-movable pages

From: Mel Gorman
Date: Fri Jan 26 2007 - 12:20:33 EST


On Fri, 26 Jan 2007, Christoph Lameter wrote:

On Fri, 26 Jan 2007, Mel Gorman wrote:

For arches that do not have HIGHMEM other zones would be okay too it
seems.
It would, but it'd obscure the code to take advantage of that.

No MOVABLE memory for 64 bit platforms that do not have HIGHMEM right now?


err, no, I misinterpreted what you meant by "other zones would be ok..". I though you were suggesting the reuse of zone names for some reason.

The zone used to for ZONE_MOVABLE is the highest populated zone on the architecture. On some architectures, that will be ZONE_HIGHMEM. On others, it will be ZONE_DMA. See the function find_usable_zone_for_movable()

ZONE_MOVABLE never spans zones. For example, it will not use some ZONE_HIGHMEM and some ZONE_NORMAL memory.

The anti-fragmentation code could potentially be used to have subzone groups
that kept movable and unmovable allocations as far apart as possible and at
opposite ends of a zone. That approach has been kicked a few times because of
complexity.

Hmm... But his patch also introduces additional complexity plus its
difficult to handle for the end user.


It's harder for the user to setup all right. But it works within limits that are known well in advance and doesn't add additional code to the main allocator path. Once it's setup, it acts like any other zone and zone behavior is better understood than anti-fragmentations behavior.

There are some NUMA architectures that are not that
symmetric.
I know, it's why find_zone_movable_pfns_for_nodes() is as complex as it is.
The mechanism spreads the unmovable memory evenly throughout all nodes. In the
event some nodes are too small to hold their share, the remaining unmovable
memory is divided between the nodes that are larger.

I would have expected a percentage of a node. If equal amounts of
unmovable memory are assigned to all nodes at first then there will be
large disparities in the amount of movable memories f.e. between a node
with 8G memory compared to a node with 1GB memory.


On the other hand, percentages make it harder for the administrator to know in advance how much unmovable memory will be available when the system starts even if the machine changes configuration. The absolute figure is easier to understand. If there was a requirement, an alternative configuration option could be made available that takes a fixed percentage of each node with memory.

How do you handle headless nodes? I.e. memory nodes with no processors?

The code only cares about memory, not processors.

Those may be particularly large compared to the rest but these are mainly
used for movable pages since unmovable things like device drivers buffers
have to be kept near the processors that take the interrupt.


Then what I'd do is specify kernelcore to be

(number_of_nodes_with_processors * largest_amount_of_memory_on_node_with_processors)

That would have all memory near processors available as unmovable memory (that movable allocations will still use so they don't always go remote) while keeping a large amount of memory on the headless nodes for movable allocations only.

If requirements demanded, a configuration option could be made that allows the administrator to specify exactly how much unmovable memory he wants on a specific node.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/