Re: [PATCH 6.0 19/20] mm/huge_memory: do not clobber swp_entry_t during THP split

From: Hugh Dickins
Date: Tue Oct 25 2022 - 11:12:07 EST


On Mon, 24 Oct 2022, Greg Kroah-Hartman wrote:
> From: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
>
> commit 71e2d666ef85d51834d658830f823560c402b8b6 upstream.
>
> The following has been observed when running stressng mmap since commit
> b653db77350c ("mm: Clear page->private when splitting or migrating a page")
>
> watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546]
> CPU: 75 PID: 9546 Comm: stress-ng Tainted: G E 6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374
> Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
> RIP: 0010:xas_descend+0x28/0x80
> Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
> RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246
> RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002
> RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0
> RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0
> R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000
> R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88
> FS: 00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> xas_load+0x3a/0x50
> __filemap_get_folio+0x80/0x370
> ? put_swap_page+0x163/0x360
> pagecache_get_page+0x13/0x90
> __try_to_reclaim_swap+0x50/0x190
> scan_swap_map_slots+0x31e/0x670
> get_swap_pages+0x226/0x3c0
> folio_alloc_swap+0x1cc/0x240
> add_to_swap+0x14/0x70
> shrink_page_list+0x968/0xbc0
> reclaim_page_list+0x70/0xf0
> reclaim_pages+0xdd/0x120
> madvise_cold_or_pageout_pte_range+0x814/0xf30
> walk_pgd_range+0x637/0xa30
> __walk_page_range+0x142/0x170
> walk_page_range+0x146/0x170
> madvise_pageout+0xb7/0x280
> ? asm_common_interrupt+0x22/0x40
> madvise_vma_behavior+0x3b7/0xac0
> ? find_vma+0x4a/0x70
> ? find_vma+0x64/0x70
> ? madvise_vma_anon_name+0x40/0x40
> madvise_walk_vmas+0xa6/0x130
> do_madvise+0x2f4/0x360
> __x64_sys_madvise+0x26/0x30
> do_syscall_64+0x5b/0x80
> ? do_syscall_64+0x67/0x80
> ? syscall_exit_to_user_mode+0x17/0x40
> ? do_syscall_64+0x67/0x80
> ? syscall_exit_to_user_mode+0x17/0x40
> ? do_syscall_64+0x67/0x80
> ? do_syscall_64+0x67/0x80
> ? common_interrupt+0x8b/0xa0
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> The problem can be reproduced with the mmtests config
> config-workload-stressng-mmap. It does not always happen and when it
> triggers is variable but it has happened on multiple machines.
>
> The intent of commit b653db77350c patch was to avoid the case where
> PG_private is clear but folio->private is not-NULL. However, THP tail
> pages uses page->private for "swp_entry_t if folio_test_swapcache()" as
> stated in the documentation for struct folio. This patch only clobbers
> page->private for tail pages if the head page was not in swapcache and
> warns once if page->private had an unexpected value.
>
> Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@xxxxxxxxxxxxxxxxxxx
> Fixes: b653db77350c ("mm: Clear page->private when splitting or migrating a page")
> Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> Cc: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> Cc: Yang Shi <shy828301@xxxxxxxxx>
> Cc: Brian Foster <bfoster@xxxxxxxxxx>
> Cc: Dan Streetman <ddstreet@xxxxxxxx>
> Cc: Miaohe Lin <linmiaohe@xxxxxxxxxx>
> Cc: Oleksandr Natalenko <oleksandr@xxxxxxxxxxxxxx>
> Cc: Seth Jennings <sjenning@xxxxxxxxxx>
> Cc: Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

Greg, this patch from Mel is important,
but introduces a warning which is giving Tvrtko trouble - see linux-mm
https://lore.kernel.org/linux-mm/1596edbb-02ad-6bdf-51b8-15c2d2e08b76@xxxxxxxxxxxxxxx/

We already have the fix for the warning, it's making its way through the
system, and is marked for stable, but it has not reached Linus's tree yet.

Please drop this 19/20 from 6.0.4, then I'll reply to this to let you know
when the fix does reach Linus's tree - hopefully the two can go together
in the next 6.0-stable.

I apologize for not writing yesterday: gmail had gathered together
different threads with the same subject, I thought you and stable
were Cc'ed on the linux-mm mail and you would immediately drop it
yourself, but in fact you were not on that thread at all.

Thanks,
Hugh