Re: [PATCH v2 2/2] mm,page_owner: Fix accounting of pages when migrating

From: Oscar Salvador
Date: Wed Mar 20 2024 - 00:58:53 EST


On Tue, Mar 19, 2024 at 06:48:31PM +0000, Matthew Wilcox wrote:
> Is this the right way to fix this problem? I would have thought we'd
> be better off accounting this as migration freeing the old page and
> allocating the new page. If I understand correctly, this is the code
> which says "This page was last allocated by X and freed by Y", and I
> would think that being last freed (or allocated) by the migration code
> would be a very nice hint about where a problem might stem from.

I hear you, and I had the same thought when I stumbled upon this.
I did not know that the handle was being changed, otherwise it would
have saved me quite a lot of debugging time.

Checking the history of this, I can see this decision was made in
2016 in:

commit d435edca928805074dae005ab9a42d9fa60fc702
Author: Vlastimil Babka <vbabka@xxxxxxx>
Date: Tue Mar 15 14:56:15 2016 -0700

mm, page_owner: copy page owner info during migration


And let me quote:

"The page_owner mechanism stores gfp_flags of an allocation and stack
trace that lead to it. During page migration, the original information
is practically replaced by the allocation of free page as the migration
target. Arguably this is less useful and might lead to all the
page_owner info for migratable pages gradually converge towards
compaction or numa balancing migrations. It has also lead to
inaccuracies such as one fixed by commit e2cfc91120fa ("mm/page_owner:
set correct gfp_mask on page_owner")."

A following patch stored the migration reason in last_migrate_reason,
and the patch also add a bit of information if last_migrate_reason was
other than 0:

+ if (page_ext->last_migrate_reason != -1) {
+ ret += snprintf(kbuf + ret, count - ret,
+ "Page has been migrated, last migrate reason: %s\n",
+ migrate_reason_names[page_ext->last_migrate_reason]);
+ if (ret >= count)
+ goto err;
+ }

Now, thinking about this some more, it kind of makes sense, because one
of the things page_owner is used for, in my experience, is for memory
leaks.
We print the output, try to correlate allocation/free operations per
stack so one can easily spot a stack that just keeps allocating memory
and never frees it (it might be legit, and not a leak, but it gives a
hint).

Now imagine there are 10k pages pointing to stack A.
If those pages were to be migrated e.g: kcompactd jumps in, we will lose
the original stack and replace it with something like:

migrate_pages()
..
..
kcompatd

After that, stack A does not have those 10k pages pointing to it
anymore, although it stills "own" them, just that got replaced by
new ones due to migration.

This kind of defeats the purpose of page_owner.
And after all, we do record some migration information in those new
pages, which will give us a hint when looking at the output.

So, all in all, I am leaning towards "this is fine".


--
Oscar Salvador
SUSE Labs