Re: [PATCH V2 0/6] mm: page_alloc: freelist migratetype hygiene

From: Mike Kravetz
Date: Sat Sep 16 2023 - 15:59:12 EST


On 09/15/23 10:16, Johannes Weiner wrote:
> On Thu, Sep 14, 2023 at 04:52:38PM -0700, Mike Kravetz wrote:
> > In next-20230913, I started hitting the following BUG. Seems related
> > to this series. And, if series is reverted I do not see the BUG.
> >
> > I can easily reproduce on a small 16G VM. kernel command line contains
> > "hugetlb_free_vmemmap=on hugetlb_cma=4G". Then run the script,
> > while true; do
> > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> > echo 4 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote
> > echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> > done
> >
> > For the BUG below I believe it was the first (or second) 1G page creation from
> > CMA that triggered: cma_alloc of 1G.
> >
> > Sorry, have not looked deeper into the issue.
>
> Thanks for the report, and sorry about the breakage!
>
> I was scratching my head at this:
>
> /* MIGRATE_ISOLATE page should not go to pcplists */
> VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
>
> because there is nothing in page isolation that prevents setting
> MIGRATE_ISOLATE on something that's on the pcplist already. So why
> didn't this trigger before already?
>
> Then it clicked: it used to only check the *pcpmigratetype* determined
> by free_unref_page(), which of course mustn't be MIGRATE_ISOLATE.
>
> Pages that get isolated while *already* on the pcplist are fine, and
> are handled properly:
>
> mt = get_pcppage_migratetype(page);
>
> /* MIGRATE_ISOLATE page should not go to pcplists */
> VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
>
> /* Pageblock could have been isolated meanwhile */
> if (unlikely(isolated_pageblocks))
> mt = get_pageblock_migratetype(page);
>
> So this was purely a sanity check against the pcpmigratetype cache
> operations. With that gone, we can remove it.

With the patch below applied, a slightly different workload triggers the
following warnings. It seems related, and appears to go away when
reverting the series.

[ 331.595382] ------------[ cut here ]------------
[ 331.596665] page type is 5, passed migratetype is 1 (nr=512)
[ 331.598121] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:662 expand+0x1c9/0x200
[ 331.600549] Modules linked in: rfkill ip6table_filter ip6_tables sunrpc snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_seq 9p snd_seq_device netfs 9pnet_virtio snd_pcm joydev snd_timer virtio_balloon snd soundcore 9pnet virtio_blk virtio_console virtio_net net_failover failover crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring fuse
[ 331.609530] CPU: 2 PID: 935 Comm: bash Tainted: G W 6.6.0-rc1-next-20230913+ #26
[ 331.611603] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[ 331.613527] RIP: 0010:expand+0x1c9/0x200
[ 331.614492] Code: 89 ef be 07 00 00 00 c6 05 c9 b1 35 01 01 e8 de f7 ff ff 8b 4c 24 30 8b 54 24 0c 48 c7 c7 68 9f 22 82 48 89 c6 e8 97 b3 df ff <0f> 0b e9 db fe ff ff 48 c7 c6 f8 9f 22 82 48 89 df e8 41 e3 fc ff
[ 331.618540] RSP: 0018:ffffc90003c97a88 EFLAGS: 00010086
[ 331.619801] RAX: 0000000000000000 RBX: ffffea0007ff8000 RCX: 0000000000000000
[ 331.621331] RDX: 0000000000000005 RSI: ffffffff8224dce6 RDI: 00000000ffffffff
[ 331.622914] RBP: 00000000001ffe00 R08: 0000000000009ffb R09: 00000000ffffdfff
[ 331.624712] R10: 00000000ffffdfff R11: ffffffff824660c0 R12: ffff88827fffcd80
[ 331.626317] R13: 0000000000000009 R14: 0000000000000200 R15: 000000000000000a
[ 331.627810] FS: 00007f24b3932740(0000) GS:ffff888477c00000(0000) knlGS:0000000000000000
[ 331.630593] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 331.631865] CR2: 0000560a53875018 CR3: 000000017eee8003 CR4: 0000000000370ee0
[ 331.633382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 331.634873] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 331.636324] Call Trace:
[ 331.636934] <TASK>
[ 331.637521] ? expand+0x1c9/0x200
[ 331.638320] ? __warn+0x7d/0x130
[ 331.639116] ? expand+0x1c9/0x200
[ 331.639957] ? report_bug+0x18d/0x1c0
[ 331.640832] ? handle_bug+0x41/0x70
[ 331.641635] ? exc_invalid_op+0x13/0x60
[ 331.642522] ? asm_exc_invalid_op+0x16/0x20
[ 331.643494] ? expand+0x1c9/0x200
[ 331.644264] ? expand+0x1c9/0x200
[ 331.645007] rmqueue_bulk+0xf4/0x530
[ 331.645847] get_page_from_freelist+0x3ed/0x1040
[ 331.646837] ? prepare_alloc_pages.constprop.0+0x197/0x1b0
[ 331.647977] __alloc_pages+0xec/0x240
[ 331.648783] alloc_buddy_hugetlb_folio.isra.0+0x6a/0x150
[ 331.649912] __alloc_fresh_hugetlb_folio+0x157/0x230
[ 331.650938] alloc_pool_huge_folio+0xad/0x110
[ 331.651909] set_max_huge_pages+0x17d/0x390
[ 331.652760] nr_hugepages_store_common+0x91/0xf0
[ 331.653825] kernfs_fop_write_iter+0x108/0x1f0
[ 331.654986] vfs_write+0x207/0x400
[ 331.655925] ksys_write+0x63/0xe0
[ 331.656832] do_syscall_64+0x37/0x90
[ 331.657793] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 331.660398] RIP: 0033:0x7f24b3a26e87
[ 331.661342] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 331.665673] RSP: 002b:00007ffccd603de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 331.667541] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f24b3a26e87
[ 331.669197] RDX: 0000000000000005 RSI: 0000560a5381bb50 RDI: 0000000000000001
[ 331.670883] RBP: 0000560a5381bb50 R08: 000000000000000a R09: 00007f24b3abe0c0
[ 331.672536] R10: 00007f24b3abdfc0 R11: 0000000000000246 R12: 0000000000000005
[ 331.674175] R13: 00007f24b3afa520 R14: 0000000000000005 R15: 00007f24b3afa720
[ 331.675841] </TASK>
[ 331.676450] ---[ end trace 0000000000000000 ]---
[ 331.677659] ------------[ cut here ]------------


[ 331.677659] ------------[ cut here ]------------
[ 331.679109] page type is 5, passed migratetype is 1 (nr=512)
[ 331.680376] WARNING: CPU: 2 PID: 935 at mm/page_alloc.c:699 del_page_from_free_list+0x137/0x170
[ 331.682314] Modules linked in: rfkill ip6table_filter ip6_tables sunrpc snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_seq 9p snd_seq_device netfs 9pnet_virtio snd_pcm joydev snd_timer virtio_balloon snd soundcore 9pnet virtio_blk virtio_console virtio_net net_failover failover crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring fuse
[ 331.691852] CPU: 2 PID: 935 Comm: bash Tainted: G W 6.6.0-rc1-next-20230913+ #26
[ 331.694026] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[ 331.696162] RIP: 0010:del_page_from_free_list+0x137/0x170
[ 331.697589] Code: c6 05 a0 b5 35 01 01 e8 b7 fb ff ff 44 89 f1 44 89 e2 48 c7 c7 68 9f 22 82 48 89 c6 b8 01 00 00 00 d3 e0 89 c1 e8 69 b7 df ff <0f> 0b e9 03 ff ff ff 48 c7 c6 a0 9f 22 82 48 89 df e8 13 e7 fc ff
[ 331.702060] RSP: 0018:ffffc90003c97ac8 EFLAGS: 00010086
[ 331.703430] RAX: 0000000000000000 RBX: ffffea0007ff8000 RCX: 0000000000000000
[ 331.705284] RDX: 0000000000000005 RSI: ffffffff8224dce6 RDI: 00000000ffffffff
[ 331.707101] RBP: 00000000001ffe00 R08: 0000000000009ffb R09: 00000000ffffdfff
[ 331.708933] R10: 00000000ffffdfff R11: ffffffff824660c0 R12: 0000000000000001
[ 331.710754] R13: ffff88827fffcd80 R14: 0000000000000009 R15: 0000000000000009
[ 331.712637] FS: 00007f24b3932740(0000) GS:ffff888477c00000(0000) knlGS:0000000000000000
[ 331.714861] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 331.716466] CR2: 0000560a53875018 CR3: 000000017eee8003 CR4: 0000000000370ee0
[ 331.718441] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 331.720372] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 331.723583] Call Trace:
[ 331.724351] <TASK>
[ 331.725045] ? del_page_from_free_list+0x137/0x170
[ 331.726370] ? __warn+0x7d/0x130
[ 331.727326] ? del_page_from_free_list+0x137/0x170
[ 331.728637] ? report_bug+0x18d/0x1c0
[ 331.729688] ? handle_bug+0x41/0x70
[ 331.730707] ? exc_invalid_op+0x13/0x60
[ 331.731798] ? asm_exc_invalid_op+0x16/0x20
[ 331.733007] ? del_page_from_free_list+0x137/0x170
[ 331.734317] ? del_page_from_free_list+0x137/0x170
[ 331.735649] rmqueue_bulk+0xdf/0x530
[ 331.736741] get_page_from_freelist+0x3ed/0x1040
[ 331.738069] ? prepare_alloc_pages.constprop.0+0x197/0x1b0
[ 331.739578] __alloc_pages+0xec/0x240
[ 331.740666] alloc_buddy_hugetlb_folio.isra.0+0x6a/0x150
[ 331.742135] __alloc_fresh_hugetlb_folio+0x157/0x230
[ 331.743521] alloc_pool_huge_folio+0xad/0x110
[ 331.744768] set_max_huge_pages+0x17d/0x390
[ 331.745988] nr_hugepages_store_common+0x91/0xf0
[ 331.747306] kernfs_fop_write_iter+0x108/0x1f0
[ 331.748651] vfs_write+0x207/0x400
[ 331.749735] ksys_write+0x63/0xe0
[ 331.750808] do_syscall_64+0x37/0x90
[ 331.753203] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 331.754857] RIP: 0033:0x7f24b3a26e87
[ 331.756184] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 331.760239] RSP: 002b:00007ffccd603de8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 331.761935] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f24b3a26e87
[ 331.763524] RDX: 0000000000000005 RSI: 0000560a5381bb50 RDI: 0000000000000001
[ 331.765102] RBP: 0000560a5381bb50 R08: 000000000000000a R09: 00007f24b3abe0c0
[ 331.766740] R10: 00007f24b3abdfc0 R11: 0000000000000246 R12: 0000000000000005
[ 331.768344] R13: 00007f24b3afa520 R14: 0000000000000005 R15: 00007f24b3afa720
[ 331.769949] </TASK>
[ 331.770559] ---[ end trace 0000000000000000 ]---

--
Mike Kravetz

> ---
>
> From b0cb92ed10b40fab0921002effa8b726df245790 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@xxxxxxxxxxx>
> Date: Fri, 15 Sep 2023 09:59:52 -0400
> Subject: [PATCH] mm: page_alloc: remove pcppage migratetype caching fix
>
> Mike reports the following crash in -next:
>
> [ 28.643019] page:ffffea0004fb4280 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13ed0a
> [ 28.645455] flags: 0x200000000000000(node=0|zone=2)
> [ 28.646835] page_type: 0xffffffff()
> [ 28.647886] raw: 0200000000000000 dead000000000100 dead000000000122 0000000000000000
> [ 28.651170] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
> [ 28.653124] page dumped because: VM_BUG_ON_PAGE(is_migrate_isolate(mt))
> [ 28.654769] ------------[ cut here ]------------
> [ 28.655972] kernel BUG at mm/page_alloc.c:1231!
>
> This VM_BUG_ON() used to check that the cached pcppage_migratetype set
> by free_unref_page() wasn't MIGRATE_ISOLATE.
>
> When I removed the caching, I erroneously changed the assert to check
> that no isolated pages are on the pcplist. This is quite different,
> because pages can be isolated *after* they had been put on the
> freelist already (which is handled just fine).
>
> IOW, this was purely a sanity check on the migratetype caching. With
> that gone, the check should have been removed as well. Do that now.
>
> Reported-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> ---
> mm/page_alloc.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e3f1c777feed..9469e4660b53 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1207,9 +1207,6 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> count -= nr_pages;
> pcp->count -= nr_pages;
>
> - /* MIGRATE_ISOLATE page should not go to pcplists */
> - VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> -
> __free_one_page(page, pfn, zone, order, mt, FPI_NONE);
> trace_mm_page_pcpu_drain(page, order, mt);
> } while (count > 0 && !list_empty(list));
> --
> 2.42.0
>