Re: ZONE_NORMAL vs. ZONE_MOVABLE

From: Vlastimil Babka
Date: Thu Mar 30 2017 - 03:55:39 EST


On 03/20/2017 07:33 AM, Joonsoo Kim wrote:
>> The fact sticky movable pageblocks aren't ideal for CMA doesn't mean
>> they're not ideal for memory hotunplug though.
>>
>> With CMA there's no point in having the sticky movable pageblocks
>> scattered around and it's purely a misfeature to use sticky movable
>> pageblocks because you need the whole CMA area contiguous hence a
>> ZONE_CMA is ideal.
> No. CMA ranges could be registered many times for each devices and they
> could be scattered due to device's H/W limitation. So, current implementation
> in kernel, MIGRATE_CMA pageblocks, are scattered sometimes.
>
>> As opposed with memory hotplug the sticky movable pageblocks would
>> allow the kernel to satisfy the current /sys API and they would
>> provide no downside unlike in the CMA case where the size of the
>> allocation is unknown.
> No, same downside also exists in this case. Downside is not related to the case
> that device uses that range. It is related to VM management to this range and
> problems are the same. For example, with sticky movable pageblock, we need to
> subtract number of freepages in sticky movable pageblock when watermark is
> checked for non-movable allocation and it causes some problems.

Agree. Right now for CMA we have to account NR_FREE_CMA_PAGES (number of
free pages within MIGRATE_CMA pageblocks), which brings all those hooks
and other troubles for keep the accounting precise (there used to be
various races in there). This goes against the rest of page grouping by
mobility design, which wasn't meant to be precise for performance
reasons (e.g. when you change pageblock type and move pages between
freelists, any pcpu cached pages are left at their previous type's list).

We also can't ignore this accounting, as then the watermark check could
then pass for e.g. UNMOVABLE allocation, which would proceed to find
that the only free pages available are within the MIGRATE_CMA (or
sticky-movable) pageblocks, where it's not allowed to fallback to. If
only then we went reclaiming, the zone balance checks would also
consider the zone balanced, even though unmovable allocations would
still not be possible.

Even with this extra accounting, things are not perfect, because reclaim
doesn't guarantee freeing the pages in the right pageblocks, so we can
easily overreclaim. That's mainly why I agreed that ZONE_CMA should be
better than the current implementation, and I'm skeptical about the
sticky-movable pageblock idea. Note the conversion to node-lru reclaim
has changed things somewhat, as we can't reclaim a single zone anymore,
but the accounting troubles remain.