Re: 6.9/BUG: Bad page state in process kswapd0 pfn:d6e840

From: David Hildenbrand
Date: Tue May 28 2024 - 09:58:15 EST


Am 28.05.24 um 08:05 schrieb Mikhail Gavrilov:
On Thu, May 23, 2024 at 12:05 PM Mikhail Gavrilov
<mikhail.v.gavrilov@xxxxxxxxx> wrote:

On Thu, May 9, 2024 at 10:50 PM David Hildenbrand <david@xxxxxxxxxx> wrote:

Do you have the other stracktrace as well?

Maybe triggering memory reclaim (e.g., using "stress" or "memhog") could
trigger it, that might be reasonable to trey. Once we have a reproducer
we could at least bisect.


The only known workload that causes this is updating a large
container. Unfortunately, not every container update reproduces the
problem.

Is it possible to add more debugging information to make it clearer
what's going on?

If we knew who originally allocated that problematic page, that might help. Maybe page_owner could give some hints?


BUG: Bad page state in process kcompactd0 pfn:605811
page: refcount:0 mapcount:0 mapping:0000000082d91e3e index:0x1045efc4f
pfn:0x605811
aops:btree_aops ino:1
flags: 0x17ffffc600020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x1fffff)
raw: 0017ffffc600020c dead000000000100 dead000000000122 ffff888159075220
raw: 00000001045efc4f 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: non-NULL mapping

Seems to be an order-0 page, otherwise we would have another "head: ..." report.

It's not an anon/ksm/non-lru migration folio, because we clear the page->mapping field for them manually on the page freeing path. Likely it's a pagecache folio.

So one option is that something seems to not properly set folio->mapping to NULL. But that problem would then also show up without page migration? Hmm.

Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI,
BIOS 2611 04/07/2024
Call Trace:
<TASK>
dump_stack_lvl+0x84/0xd0
bad_page.cold+0xbe/0xe0
? __pfx_bad_page+0x10/0x10
? page_bad_reason+0x9d/0x1f0
free_unref_page+0x838/0x10e0
__folio_put+0x1ba/0x2b0
? __pfx___folio_put+0x10/0x10
? __pfx___might_resched+0x10/0x10

I suspect we come via
migrate_pages_batch()->migrate_folio_unmap()->migrate_folio_done().

Maybe this is the "Folio was freed from under us. So we are done." path
when "folio_ref_count(src) == 1".

Alternatively, we might come via
migrate_pages_batch()->migrate_folio_move()->migrate_folio_done().

For ordinary migration, move_to_new_folio() will clear src->mapping if
the folio was migrated successfully. That's the very first thing that migrate_folio_move() does, so I doubt that is the problem.

So I suspect we are in the migrate_folio_unmap() path. But for
a !anon folio, who should be freeing the folio concurrently (and not clearing folio->mapping?)? After all, we have to hold the folio lock while migrating.

In khugepaged:collapse_file() we manually set folio->mapping = NULL, before dropping the reference.

Something to try might be (to see if the problem goes away).

diff --git a/mm/migrate.c b/mm/migrate.c
index dd04f578c19c..45e92e14c904 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1124,6 +1124,13 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
/* Folio was freed from under us. So we are done. */
folio_clear_active(src);
folio_clear_unevictable(src);
+ /*
+ * Anonymous and movable src->mapping will be cleared by
+ * free_pages_prepare so don't reset it here for keeping
+ * the type to work PageAnon, for example.
+ */
+ if (!folio_mapping_flags(src))
+ src->mapping = NULL;
/* free_pages_prepare() will clear PG_isolated. */
list_del(&src->lru);
migrate_folio_done(src, reason);

But it does feel weird: who freed the page concurrently and didn't clear folio->mapping ...

We don't hold the folio lock of src, though, but have the only reference. So
another possible thing might be folio refcount mis-counting: folio_ref_count() == 1 but there are other references (e.g., from the pagecache).


? migrate_folio_done+0x1de/0x2b0
migrate_pages_batch+0xe73/0x2880
? __pfx_compaction_alloc+0x10/0x10
? __pfx_compaction_free+0x10/0x10
? __pfx_migrate_pages_batch+0x10/0x10
? trace_irq_enable.constprop.0+0xce/0x110
? __pfx_remove_migration_pte+0x10/0x10
? rcu_is_watching+0x12/0xc0
migrate_pages+0x194f/0x22f0
? __pfx_compaction_alloc+0x10/0x10
? __pfx_compaction_free+0x10/0x10
? __pfx_migrate_pages+0x10/0x10
? trace_irq_enable.constprop.0+0xce/0x110
? rcu_is_watching+0x12/0xc0
? isolate_migratepages_block+0x2b02/0x4560
? __pfx_isolate_migratepages_block+0x10/0x10
? __pfx___might_resched+0x10/0x10
compact_zone+0x1a7c/0x3860
? rcu_is_watching+0x12/0xc0
? __pfx___free_object+0x10/0x10
? __pfx_compact_zone+0x10/0x10
? rcu_is_watching+0x12/0xc0
? lock_acquire+0x457/0x540
? kcompactd+0x2fa/0xc70
? rcu_is_watching+0x12/0xc0
compact_node+0x144/0x240
? __pfx_compact_node+0x10/0x10
? rcu_is_watching+0x12/0xc0
kcompactd+0x686/0xc70
? __pfx_kcompactd+0x10/0x10
? __pfx_autoremove_wake_function+0x10/0x10
? __kthread_parkme+0xb1/0x1d0
? __pfx_kcompactd+0x10/0x10
? __pfx_kcompactd+0x10/0x10
kthread+0x2d2/0x3a0
? _raw_spin_unlock_irq+0x28/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>


--
Thanks,

David / dhildenb