Re: [PATCH v2 01/11] mm/vmstat: remove remote node draining

From: David Hildenbrand
Date: Thu Mar 02 2023 - 05:11:04 EST


[...]


(2) drain_zone_pages() documents that we're draining the PCP
(bulk-freeing them) of the current CPU on remote nodes. That bulk-
freeing will properly adjust free memory counters. What exactly is
the impact when no longer doing that? Won't the "snapshot" of some
counters eventually be wrong? Do we care?

Don't see why the snapshot of counters will be wrong.

Instead of freeing pages on pcp list of remote nodes after they are
considered idle ("3 seconds idle till flush"), what will happen is that
drain_all_pages() will free those pcps, for example after an allocation
fails on direct reclaim:

page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);

/*
* If an allocation failed after direct reclaim, it could be because
* pages are pinned on the per-cpu lists or in high alloc reserves.
* Shrink them and try again
*/
if (!page && !drained) {
unreserve_highatomic_pageblock(ac, false);
drain_all_pages(NULL);
drained = true;
goto retry;
}

In both cases the pages are freed (and counters maintained) here:

static inline void __free_one_page(struct page *page,
unsigned long pfn,
struct zone *zone, unsigned int order,
int migratetype, fpi_t fpi_flags)
{
struct capture_control *capc = task_capc(zone);
unsigned long buddy_pfn = 0;
unsigned long combined_pfn;
struct page *buddy;
bool to_tail;

VM_BUG_ON(!zone_is_initialized(zone));
VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);

VM_BUG_ON(migratetype == -1);
if (likely(!is_migrate_isolate(migratetype)))
__mod_zone_freepage_state(zone, 1 << order, migratetype);

VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page);
VM_BUG_ON_PAGE(bad_range(zone, page), page);

while (order < MAX_ORDER - 1) {
if (compaction_capture(capc, page, order, migratetype)) {
__mod_zone_freepage_state(zone, -(1 << order),
migratetype);
return;
}

Describing the difference between instructed refresh of vmstat and "remotely
drain per-cpu lists" in order to move free memory from the pcp to the buddy
would be great.

The difference is that now remote PCPs will be drained on demand, either via
kcompactd or direct reclaim (through drain_all_pages), when memory is
low.

For example, with the following test:

dd if=/dev/zero of=file bs=1M count=32000 on a tmpfs filesystem:

kcompactd0-116 [005] ...1 228232.042873: drain_all_pages <-kcompactd_do_work
kcompactd0-116 [005] ...1 228232.042873: __drain_all_pages <-kcompactd_do_work
dd-479485 [003] ...1 228232.455130: __drain_all_pages <-__alloc_pages_slowpath.constprop.0
dd-479485 [011] ...1 228232.721994: __drain_all_pages <-__alloc_pages_slowpath.constprop.0
gnome-shell-3750 [015] ...1 228232.723729: __drain_all_pages <-__alloc_pages_slowpath.constprop.0

The commit message was indeed incorrect. Updated one:

"mm/vmstat: remove remote node draining

Draining of pages from the local pcp for a remote zone should not be
necessary, since once the system is low on memory (or compaction on a
zone is in effect), drain_all_pages should be called freeing any unused
pcps."

Thanks!

Thanks for the explanation, that makes sense to me. Feel free to add my

Acked-by: David Hildenbrand <david@xxxxxxxxxx>

... hoping that some others (Mel, Vlastimil?) can have another look.

--
Thanks,

David / dhildenb