Re: v5.18-rc1: migratepages triggers VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
From: Zi Yan
Date: Mon Apr 04 2022 - 10:05:11 EST
On 4 Apr 2022, at 9:29, Naoya Horiguchi wrote:
> Hi,
>
> I found that the below VM_BUG_ON_FOLIO is triggered on v5.18-rc1
> (and also reproducible with mmotm on 3/31).
> I have no idea about the bug's mechanism, but it seems not to be
> shared in LKML yet, so let me just share. config.gz is attached.
>
> This easily reproduces (for example) by calling migratepages(8)
> command by any of running process (like PID 1).
>
> Could anyone help me solve this?
>
> Thanks,
> Naoya Horiguchi
>
> [ 48.206424] page:0000000021452e3a refcount:6 mapcount:0 mapping:000000003aaf5253 index:0x0 pfn:0x14e600
> [ 48.213316] head:0000000021452e3a order:9 compound_mapcount:0 compound_pincount:0
> [ 48.218830] aops:xfs_address_space_operations [xfs] ino:dee dentry name:"libc.so.6"
> [ 48.225098] flags: 0x57ffffc0012027(locked|referenced|uptodate|active|private|head|node=1|zone=2|lastcpupid=0x1fffff)
> [ 48.232792] raw: 0057ffffc0012027 0000000000000000 dead000000000122 ffff8a0dc9a376b8
> [ 48.238464] raw: 0000000000000000 ffff8a0dc6b23d20 00000006ffffffff 0000000000000000
> [ 48.244109] page dumped because: VM_BUG_ON_FOLIO(folio_nr_pages(old) != nr_pages)
> [ 48.249196] ------------[ cut here ]------------
> [ 48.251240] kernel BUG at mm/memcontrol.c:6857!
> [ 48.253896] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [ 48.255377] CPU: 5 PID: 844 Comm: migratepages Tainted: G E 5.18.0-rc1-v5.18-rc1-220404-1637-000-rc1+ #39
> [ 48.258251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
> [ 48.260535] RIP: 0010:mem_cgroup_migrate+0x217/0x320
> [ 48.261914] Code: 48 89 ef e8 5b 2c f7 ff 0f 0b 48 c7 c6 e8 64 5b b9 48 89 ef e8 4a 2c f7 ff 0f 0b 48 c7 c6 28 65 5b b9 48 89 ef e8 39 2c f7 ff <0f> 0b e8 12 79 e0 ff 49 8b 45 10 a8 03 0f 85 d2 00 00 00 65 48 ff
> [ 48.268541] RSP: 0018:ffffa19b41b77a20 EFLAGS: 00010286
> [ 48.270245] RAX: 0000000000000045 RBX: 0000000000000200 RCX: 0000000000000000
> [ 48.272494] RDX: 0000000000000001 RSI: ffffffffb9599561 RDI: 00000000ffffffff
> [ 48.274726] RBP: ffffe30f85398000 R08: 0000000000000000 R09: 00000000ffffdfff
> [ 48.276969] R10: ffffa19b41b77810 R11: ffffffffb9940d08 R12: 0000000000000000
> [ 48.279136] R13: ffffe30f85398000 R14: ffff8a0dc6b23d20 R15: 0000000000000200
> [ 48.281151] FS: 00007fadd1182740(0000) GS:ffff8a0efbc80000(0000) knlGS:0000000000000000
> [ 48.283422] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 48.285059] CR2: 00007fadd118b090 CR3: 0000000144432005 CR4: 0000000000170ee0
> [ 48.286942] Call Trace:
> [ 48.287665] <TASK>
> [ 48.288255] iomap_migrate_page+0x64/0x190
> [ 48.289366] move_to_new_page+0xa3/0x470
> [ 48.290448] ? page_not_mapped+0xa/0x20
> [ 48.291491] ? rmap_walk_file+0xe1/0x1f0
> [ 48.292503] ? try_to_migrate+0x8e/0xd0
> [ 48.293524] migrate_pages+0x166e/0x1870
> [ 48.294607] ? migrate_page+0xe0/0xe0
> [ 48.295761] ? walk_page_range+0x9a/0x110
> [ 48.296885] migrate_to_node+0xea/0x120
> [ 48.297873] do_migrate_pages+0x23c/0x2a0
> [ 48.298925] kernel_migrate_pages+0x3f5/0x470
> [ 48.300149] __x64_sys_migrate_pages+0x19/0x20
> [ 48.301371] do_syscall_64+0x3b/0x90
> [ 48.302340] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 48.303789] RIP: 0033:0x7fadd0f0af3d
> [ 48.304957] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bb ee 0e 00 f7 d8 64 89 01 48
> [ 48.310983] RSP: 002b:00007fff5997e178 EFLAGS: 00000246 ORIG_RAX: 0000000000000100
> [ 48.313444] RAX: ffffffffffffffda RBX: 0000556a722bf120 RCX: 00007fadd0f0af3d
> [ 48.315763] RDX: 0000556a722bf140 RSI: 0000000000000401 RDI: 000000000000034a
> [ 48.318070] RBP: 000000000000034a R08: 0000000000000000 R09: 0000000000000003
> [ 48.320370] R10: 0000556a722bf1f0 R11: 0000000000000246 R12: 0000556a722bf1d0
> [ 48.322679] R13: 000000000000034a R14: 00007fadd11cec00 R15: 0000556a71a59d50
> [ 48.324998] </TASK>
Is it because migration code assumes all THPs have order=HPAGE_PMD_ORDER?
Would the patch below fix the issue?
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a2516d31db6c..358b7c11426d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1209,7 +1209,7 @@ static struct page *new_page(struct page *page, unsigned long start)
struct page *thp;
thp = alloc_hugepage_vma(GFP_TRANSHUGE, vma, address,
- HPAGE_PMD_ORDER);
+ thp_order(page));
if (!thp)
return NULL;
prep_transhuge_page(thp);
diff --git a/mm/migrate.c b/mm/migrate.c
index de175e2fdba5..79e4b36f709a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1547,7 +1547,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
*/
gfp_mask &= ~__GFP_RECLAIM;
gfp_mask |= GFP_TRANSHUGE;
- order = HPAGE_PMD_ORDER;
+ order = thp_order(page);
}
zidx = zone_idx(page_zone(page));
if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
--
Best Regards,
Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature