I recall Rohit's patch from an earlier -mm. Without knowing anything about
his test, I am guessing he is getting cheap page colouring by preloading
the per-cpu cache with contiguous pages and his workload is faulting in
the batch of pages immediately by doing something like linearly reading a
large array. Hence, the mappings of his workload are getting the right
colour pages. This makes his workload a "lucky" workload. The general
benefit of preloading the percpu magazines is that there is a chance the
allocator only has to be called once, not pcp->batch times.
An odd/even allocation scheme could be provided by having two free_lists
in a free_area. One list for the "left buddy" and the other list for the
"right buddy". However, at best, that would provide two colours. I'm not
sure how much benefit it would give for the cost of more linked lists.
To replicate the functionality of these patches with zones would require
two additional zones for NormalEasy and HighmemEasy (I suck at naming
things). The plus side is that once the zone fallback lists are updated,
the page allocator remains more or less the same as it is today. Then the
headaches start.
Problem 1: Zone fallback lists are "one-way" and per-node. Lets assume a
fallback list of HighMemEasy, HighMem, NormalEasy, Normal, DMA. Assuming
we are allocating PTEs from high memory, we could fallback to the Normal
zone even if highmem pages are available because the HighMem zone was out
of pages. It will require very different fallback logic to say that
HighMem allocations can also use HighMemEasy rather than falling back to
Normal.
Problem 2: Setting the zone size will be a very difficult tunable to get
right. Right off, we are are introducing a tunable which will make
foreheads furrow. If the tunable is set wrong, system performance will
suffer and we could see situations where kernel allocations fail because
it's zone got depleted.
Problem 3: To get rid of the tunable, we could try resizing the zones
dynamically but that will be hard. Obviously, the zones are going to be
physically adjacent to each other. To resize the zone, the pages at one
end of the zone will need to be free. Shrinking the NormalEasy zone would
be easy enough, but shrinking the Normal zone with kernel pages in it
would be considerably harder, if not outright impossible. One page in the
wrong place will mean the zone cannot be resized
Problem 4: Page reclaim would have two new zones to deal with bringing
with it a new set of zone balancing problems. That brings it's own special
brand of fun.
There may be more problems but these 4 are fairly important. This patchset
does not suffer from the same problems.
Problem 1: This patchset has a fallback list for each allocation type. So
EasyRclm allocations can just as easily use an area reserved for kernel
allocations and vice versa. Obviously we don't like when this happens, but
when it does, things start fragmenting rather than breaking.
Problem 2: The number of pages that get reserved for each type grows and
shrinks on demand. There is no tunable and no need for one.
Problem 3: Problem doesn't exist for this patchset
Problem 4: Problem doesn't exist for this patchset.
Bottom line, using zones will be more complex than this set of patches and
bring a lot of tricky issues with it.