On Wed, 27 Jan 2016 22:19:14 -0800 Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
ZONE_DEVICE (merged in 4.3) and ZONE_CMA (proposed) are examples of new
mm zones that are bumping up against the current maximum limit of 4
zones, i.e. 2 bits in page->flags. When adding a zone this equation
still needs to be satisified:
SECTIONS_WIDTH + ZONES_WIDTH + NODES_SHIFT + LAST_CPUPID_SHIFT
<= BITS_PER_LONG - NR_PAGEFLAGS
ZONE_DEVICE currently tries to satisfy this equation by requiring that
ZONE_DMA be disabled, but this is untenable given generic kernels want
to support ZONE_DEVICE and ZONE_DMA simultaneously. ZONE_CMA would like
to increase the amount of memory covered per section, but that limits
the minimum granularity at which consecutive memory ranges can be added
via devm_memremap_pages().
The trade-off of what is acceptable to sacrifice depends heavily on the
platform. For example, ZONE_CMA is targeted for 32-bit platforms where
page->flags is constrained, but those platforms likely do not care about
the minimum granularity of memory hotplug. A big iron machine with 1024
numa nodes can likely sacrifice ZONE_DMA where a general purpose
distribution kernel can not.
CONFIG_NR_ZONES_EXTENDED is a configuration symbol that gets selected
when the number of configured zones exceeds 4. It documents the
configuration symbols and definitions that get modified when ZONES_WIDTH
is greater than 2.
For now, it steals a bit from NODES_SHIFT. Later on it can be used to
document the definitions that get modified when a 32-bit configuration
wants more zone bits.
So if you want ZONE_DMA, you're limited to 512 NUMA nodes?
That seems reasonable.