Some thoughts on memory policies

From: Christoph Lameter
Date: Mon Jun 18 2007 - 16:22:28 EST


I think we are getting into more and more of a mess with the existing
memory policies. The refcount issue with shmem is just one bad symptom of
it. Memory policies were intended to be process based and
taking them out of that context causes various issues.

I have thought for a long time that we need something to replace memory
policies especially since the requirements on memory policies go far
beyond just being process based. So some requirements and ideas about
memory policies.

1. Memory policies must be attachable to a variety of objects

- Device drivers may need to control memory allocations in
devices either because their DMA engines can only reach a
subsection of the system or because memory transfer
performance is superior in certain nodes of the system.

- Process. This is the classic usage scenario

- File / Socket. One may have particular reasons to place
objects on a set of nodes because of how the threads of
an application are spread out in the system.

- Cpuset / Container. Some simple support is there with
memory spreading today. That could be made more universal.

- System policies. The system policy is currently not
modifiable. It may be useful to be able to set this.
Small NUMA systems may want to run with interleave by default

- Address range. For the virtual memory address range
this is included in todays functionality but one may also
want to control the physical address range to make sure
f.e. that memory is allocated in an area where a device
can reach it.

- Memory policies need to be attachable to types of pages.
F.e. executable pages of a threaded application are best
spread (or replicated) whereas the stack and the data may
best be allocated in a node local way.
Useful categories that I can think of
Stack, Data, Filebacked pages, Anonymous Memory,
Shared memory, Page tables, Slabs, Mlocked pages and
huge pages.

Maybe a set of global policies would be useful for these
categories. Andy hacked subsystem memory policies into
shmem and it seems that we are now trying to do the same
for hugepages. Maybe we could get to a consistent scheme
here?

2. Memory policies need to support additional constraints

- Restriction to a set of nodes. That is what we have today.

- Restriction to a container or cpuset. Maybe restriction
to a set of containers?

- Strict vs no strict allocations. A strict allocation needs
to fail if the constraints cannot be met. A non strict
allocation can fall back.

- Order of allocation. Higher order pages may require
different allocation constraints? This is like a
generalization of huge page policies.

- Locality placement. These are node local, interleave etc.

3. Additional flags

- Automigrate flag so that memory touched by a process
is moved to a memory location that has best performance.

- Page order flag that determines the preferred allocation
order. Maybe useful in connection with the large blocksize
patch to control anonymous memory orders.

- Replicate flags so that memory is replicated.

4. Policy combinations

We need some way to combine policies in a systematic way. The current
hieracy from System->cpuset->proces->memory range does not longer
work if a process can use policies set up in shmem or huge pages.
Some consistent scheme to combine memory policies would also need
to be able to synthesize different policies. I.e. automigrate
can be combined with node local or interleave and a cpuset constraint.

5. Management tools

If we make the policies more versatile then we need the proper
management tools in user space to set and display these policies
in such a way that they can be managed by the end user. The esoteric
nature of memory policy semantics makes them difficult to comprehend.

6. GFP_xx flags may actually be considered as a form of policy

i.e. GFP_THISNODE is essentially a one node cpuset.

GFP_DMA and GFP_DMA32 are physical address range constraints.

GFP_HARDWALL is a strict vs. nonstrict distinction.


7. Allocators must change

Right now the policy is set by the process context which is bad because
one cannot specify a memory policy for an allocation. It must be possible
to pass a memory policy to the allocators and then get the memory
requested.


I wish we could come up with some universal scheme that encompasses all
of the functionality we want and that makes memory more manageable....


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/