Re: [PATCH 6/6] mm: page_alloc: consolidate free page accounting

From: Vlastimil Babka
Date: Wed Sep 13 2023 - 16:18:22 EST


On 9/11/23 21:41, Johannes Weiner wrote:
> Free page accounting currently happens a bit too high up the call
> stack, where it has to deal with guard pages, compaction capturing,
> block stealing and even page isolation. This is subtle and fragile,
> and makes it difficult to hack on the code.
>
> Now that type violations on the freelists have been fixed, push the
> accounting down to where pages enter and leave the freelist.
>
> v3:
> - fix CONFIG_UNACCEPTED_MEMORY build (lkp)
> v2:
> - fix CONFIG_DEBUG_PAGEALLOC build (Mel)
>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>

<snip>

>
> /* Used for pages not on another list */
> -static inline void add_to_free_list_tail(struct page *page, struct zone *zone,
> - unsigned int order, int migratetype)
> +static inline void add_to_free_list(struct page *page, struct zone *zone,
> + unsigned int order, int migratetype,
> + bool tail)
> {
> struct free_area *area = &zone->free_area[order];
>
> - list_add_tail(&page->buddy_list, &area->free_list[migratetype]);
> + VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype,
> + "page type is %lu, passed migratetype is %d (nr=%d)\n",
> + get_pageblock_migratetype(page), migratetype, 1 << order);

Ok, IIUC so you now assume pageblock migratetype is now matching freelist
placement at all times. This is a change from the previous treatment as a
heuristic that may be sometimes imprecise. Let's assume the previous patches
handled the deterministic reasons why those would deviate (modulo my concern
about pageblocks spanning multiple zones in reply to 5/6).

But unless I'm missing something, I don't think the possible race scenarios
were dealt with? Pageblock migratetype is set under zone->lock but there are
places that read it outside of zone->lock and then trust it to perform the
freelist placement. See for example __free_pages_ok(), or free_unref_page()
in the cases it calls free_one_page(). These determine pageblock migratetype
before taking the zone->lock. Only for has_isolate_pageblock() cases we are
more careful, because previously isolation was the only case where precision
was needed. So I think this warning is going to trigger?

> +
> + if (tail)
> + list_add_tail(&page->buddy_list, &area->free_list[migratetype]);
> + else
> + list_add(&page->buddy_list, &area->free_list[migratetype]);
> area->nr_free++;
> +
> + account_freepages(page, zone, 1 << order, migratetype);
> }
>
> /*

<snip>

> @@ -757,23 +783,21 @@ static inline void __free_one_page(struct page *page,
> VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
>
> VM_BUG_ON(migratetype == -1);
> - if (likely(!is_migrate_isolate(migratetype)))
> - __mod_zone_freepage_state(zone, 1 << order, migratetype);
> -
> VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page);
> VM_BUG_ON_PAGE(bad_range(zone, page), page);
>
> while (order < MAX_ORDER) {
> - if (compaction_capture(capc, page, order, migratetype)) {
> - __mod_zone_freepage_state(zone, -(1 << order),
> - migratetype);
> + int buddy_mt;
> +
> + if (compaction_capture(capc, page, order, migratetype))
> return;
> - }
>
> buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
> if (!buddy)
> goto done_merging;
>
> + buddy_mt = get_pfnblock_migratetype(buddy, buddy_pfn);

You should assume buddy_mt equals migratetype, no? It's the same assumption
as the VM_WARN_ONCE() I've discussed?

> +
> if (unlikely(order >= pageblock_order)) {

Only here buddy_mt can differ and the code in this block already handles that.

> /*
> * We want to prevent merge between freepages on pageblock
> @@ -801,9 +825,9 @@ static inline void __free_one_page(struct page *page,
> * merge with it and move up one order.
> */
> if (page_is_guard(buddy))
> - clear_page_guard(zone, buddy, order, migratetype);
> + clear_page_guard(zone, buddy, order);
> else
> - del_page_from_free_list(buddy, zone, order);
> + del_page_from_free_list(buddy, zone, order, buddy_mt);

Ugh so this will add account_freepages() call to each iteration of the
__free_one_page() hot loop, which seems like a lot of unnecessary overhead -
as long as we are within pageblock_order the migratetype should be the same,
and thus also is_migrate_isolate() and is_migrate_cma() tests should return
the same value so we shouldn't need to call __mod_zone_page_state()
piecemeal like this.

> combined_pfn = buddy_pfn & pfn;
> page = page + (combined_pfn - pfn);
> pfn = combined_pfn;