Re: [syzbot] [arm?] WARNING in copy_highpage
From: David Hildenbrand
Date: Mon Oct 06 2025 - 09:26:06 EST
On 06.10.25 15:17, Catalin Marinas wrote:
On Mon, Oct 06, 2025 at 09:55:27AM +0200, David Hildenbrand wrote:
Modules linked in:
CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT
Hardware name: linux,dummy-virt (DT)
pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
sp : ffff800088053940
x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
Call trace:
try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
copy_mc_highpage include/linux/highmem.h:383 [inline]
folio_mc_copy+0x44/0x6c mm/util.c:740
__migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
migrate_folio+0x1c/0x2c mm/migrate.c:882
move_to_new_folio+0x58/0x144 mm/migrate.c:1097
migrate_folio_move mm/migrate.c:1370 [inline]
migrate_folios_move mm/migrate.c:1719 [inline]
migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
migrate_pages_sync mm/migrate.c:2023 [inline]
migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
kernel_mbind mm/mempolicy.c:1682 [inline]
__do_sys_mbind mm/mempolicy.c:1756 [inline]
I don't think we ever stressed MTE with mbind before. I have a suspicion
this problem has been around for some time.
My reading of do_mbind() is that it ends up allocating pages for
migrating into via alloc_migration_target_by_mpol() ->
folio_alloc_mpol(). Pages returned should be untagged and uninitialised
unless the PG_* flags have not been cleared on a prior free. Or
migrate_pages_batch() somehow reuses some pages instead of reallocating.
Staring at __migrate_folio(), I assume we can end up successfully calling
folio_mc_copy(), but then failing in __folio_migrate_mapping().
Seems to be as easy as failing the folio_ref_freeze() in
__folio_migrate_mapping().
We return -EAGAIN in that case, making the caller retry, stumbling into an
already-tagged page. (with the same source / destination parameters) IIRC)
So likely this is simply us re-doing the copy after a migration failed after
the copy.
Could it happen that we are calling it with a different source/destination
combination the second time? I don't think so, but I am not 100% sure.
Thanks David. I can now see how it would retry on the same pages without
reallocating. At least we know it's not causing any side-effects, only
messing up the MTE safety warnings.
As long as the folio is not getting reused elsewhere, yes.
I haven't fully understood yet if there could be cases where we use the folio for another source. But I think it's not trivially possible, because I think we allocate dst folios based on source-folio properties (order, node, zone, etc).
The most reliable way would be to un-tag in case folio_mc_copy succeeded but
__folio_migrate_mapping() failed.
Clearing an MTE specific flag in the core code doesn't look great. Also
going for some generic mask like PAGE_FLAGS_CHECK_AT_PREP may have
side-effects as we don't know where the page is coming from (we have
those get_new_folio()/put_new_folio() arguments passed on by higher up
callers).
As an alternative, I would probably have done something like providing a simple folio_mc_copy_abort().
I'm tempted to just drop the warning in the arm64 copy_highpage(),
replace it with a comment about migration retrying on a potentially
tagged page. It will have to override the tags each time (as it
currently does but also warns).
Works for me. Maybe we could warn if the tag would change, because I think after we unmapped the folio during migration, the tag can no longer change.
--
Cheers
David / dhildenb