Re: [syzbot] [mm?] KCSAN: data-race in __delete_from_swap_cache / folio_mapping (3)

From: David Hildenbrand
Date: Wed Apr 03 2024 - 18:06:14 EST


On 03.04.24 23:44, Andrew Morton wrote:
On Tue, 02 Apr 2024 13:10:29 -0700 syzbot <syzbot+58fc2a881f3b3df5e336@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Hello,

syzbot found the following issue on:

HEAD commit: 39cd87c4eb2b Linux 6.9-rc2
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=129de21d180000
kernel config: https://syzkaller.appspot.com/x/.config?x=d024e89f7bb376ce
dashboard link: https://syzkaller.appspot.com/bug?extid=58fc2a881f3b3df5e336
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/b9b2dcffd7d5/disk-39cd87c4.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/5f4981fa60e6/vmlinux-39cd87c4.xz
kernel image: https://storage.googleapis.com/syzbot-assets/691f671f70ad/bzImage-39cd87c4.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+58fc2a881f3b3df5e336@xxxxxxxxxxxxxxxxxxxxxxxxx

==================================================================
BUG: KCSAN: data-race in __delete_from_swap_cache / folio_mapping

write to 0xffffea0004798fa8 of 8 bytes by task 29 on cpu 0:
__delete_from_swap_cache+0x1f2/0x290 mm/swap_state.c:161

folio->swap.val = 0;

Here we are holding the folio lock and really must invalidate that swap entry, because we are removing it from the swap cache.


delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:241
folio_free_swap+0x19f/0x1c0 mm/swapfile.c:1600
free_swap_cache mm/swap_state.c:290 [inline]
free_pages_and_swap_cache+0x1d9/0x400 mm/swap_state.c:322
__tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
tlb_finish_mmu+0x8c/0x100 mm/mmu_gather.c:465
__oom_reap_task_mm+0x231/0x2e0 mm/oom_kill.c:553
oom_reap_task_mm mm/oom_kill.c:589 [inline]
oom_reap_task mm/oom_kill.c:613 [inline]
oom_reaper+0x264/0x850 mm/oom_kill.c:654
kthread+0x1d1/0x210 kernel/kthread.c:388
ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243

read to 0xffffea0004798fa8 of 8 bytes by task 14567 on cpu 1:
folio_mapping+0xd2/0x110 mm/util.c:797

return swap_address_space(folio->swap);


And in this black-magic LRU thingy we don't.

We call folio_evictable()->folio_mapping()

Which ends up doing:

if (unlikely(folio_test_swapcache(folio))
return swap_address_space(folio->swap);

that can easily race with above code because we don't hold the folio lock.

Not sure if we should use READ_ONCE/WRITE_ONCE here, and try to handle
the race differently. We have to be prepared for folio_test_swapcache()==true but then failing to get the address space because we are concurrently removing the folio from the swapcache.

folio_evictable mm/internal.h:256 [inline]
move_folios_to_lru+0x137/0x690 mm/vmscan.c:1808
shrink_inactive_list mm/vmscan.c:1929 [inline]
shrink_list mm/vmscan.c:2163 [inline]
shrink_lruvec+0xbd8/0x1640 mm/vmscan.c:5687
shrink_node_memcgs mm/vmscan.c:5873 [inline]
shrink_node+0xa78/0x15a0 mm/vmscan.c:5908
shrink_zones mm/vmscan.c:6152 [inline]
do_try_to_free_pages+0x3cc/0xca0 mm/vmscan.c:6214

...


These both point at David's 3d2c90876887 ("mm/swap: inline
folio_set_swap_entry() and folio_swap_entry()") which is probably
innocent, but I have to blame someone ;)

Heh, I'm pretty sure that one is innocent. But also the other work in the same series is likely innocent after staring at above race. But nothing is impossible ;)

@Willy, Hugh, any idea regarding above race?

--
Cheers,

David / dhildenb