Re: [GIT PULL] Memory folios for v5.15

From: Vlastimil Babka
Date: Thu Sep 09 2021 - 08:59:32 EST


On 9/2/21 17:13, Zi Yan wrote:
>> You're really just recreating a crappier, less maintainable version of
>> the object packing that *slab already does*.
>>
>> It's *slab* that is supposed to deal with internal fragmentation, not
>> the page allocator.
>>
>> The page allocator is good at cranking out uniform, slightly big
>> memory blocks. The slab allocator is good at subdividing those into
>> smaller objects, neatly packed and grouped to facilitate contiguous
>> reclaim, while providing detailed breakdowns of per-type memory usage
>> and internal fragmentation to the user and to kernel developers.
>>
>> [ And introspection and easy reporting from production are *really
>> important*, because fragmentation issues develop over timelines that
>> extend the usual testing horizon of kernel developers. ]
>
> Initially, I thought it was a great idea to bump PAGE_SIZE to 2MB and
> use slab allocator like method for <2MB pages. But as I think about it
> more, I fail to see how it solves the existing fragmentation issues
> compared to our existing method, pageblock, since IMHO the fundamental
> issue of fragmentation in page allocation comes from mixing moveable
> and unmoveable pages in one pageblock, which does not exist in current
> slab allocation. There is no mix of reclaimable and unreclaimable objects
> in slab allocation, right?

AFAICS that's correct. Slab caches can in general merge, as that
decreases memory usage (with the tradeoff of potentially mixing objects
with different lifetimes more). But SLAB_RECLAIM_ACCOUNT (a flag for
reclaimable caches) is part of SLAB_MERGE_SAME, so caches can only merge
it they are both reclaimable or not.

> In my mind, reclaimable object is an analog
> of moveable page and unreclaimable object is an analog of unmoveable page.

More precisely it resembles reclaimable and unreclaimable pages. Movable
pages can be also migrated, but slab objects not.

> In addition, pageblock with different migrate types resembles how
> slab groups objects, so what is new in using slab instead of pageblock?

Slab would be more strict in not allowing the merge. At page allocator
level, if memory is exhausted, eventually page of any type can be
allocated from pageblock of any other type as part of the fallback. The
only really strict mechanism is movable zone.

> My key question is do we allow mixing moveable sub-2MB data chunks with
> unmoveable sub-2MB data chunks in your new slab-like allocation method?
>
> If yes, how would kernel reclaim an order-0 (2MB) page that has an
> unmoveable sub-2MB data chunk? Isn’t it the same fragmentation situation
> we are facing nowadays when kernel tries to allocate a 2MB page but finds
> every 2MB pageblock has an unmoveable page?

Yes, any scheme where all pages are not movable can theoretically
degrade to a situation where at one moment all memory is allocated by
the unmovable pages, and later almost all pages were freed, but leaving
one unmovable page in each pageblock.

> If no, why wouldn’t kernel do the same for pageblock? If kernel disallows
> page allocation fallbacks, so that unmoveable pages and moveable pages
> will not sit in a single pageblock, compaction and reclaim should be able
> to get a 2MB free page most of the time. And this would be a much smaller
> change, right?

If we did that restriction of fallbacks, it would indeed be as strict
the way as slab is, but things could still degrade to unmovable pages
scattered all over the pageblocks as mentioned above.

But since it's so similar to slabs, the same thing could happen with
slabs today, and I don't recall reports of that happening massively? But
of course slabs are not all 2MB large, serving 4k pages.

> Let me know if I miss anything.
>
>
> --
> Best Regards,
> Yan, Zi
>