Re: [PATCH] zswap: don't warn if none swapcache folio is passed to zswap_load

From: Yosry Ahmed
Date: Thu Aug 10 2023 - 23:09:05 EST


On Thu, Aug 10, 2023 at 8:03 PM Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
>
> On Thu, Aug 10, 2023 at 6:37 PM Yin, Fengwei <fengwei.yin@xxxxxxxxx> wrote:
> >
> >
> >
> > On 8/11/2023 7:43 AM, Yu Zhao wrote:
> > > On Thu, Aug 10, 2023 at 5:31 PM Yin, Fengwei <fengwei.yin@xxxxxxxxx> wrote:
> > >>
> > >>
> > >>
> > >> On 8/11/2023 7:15 AM, Yosry Ahmed wrote:
> > >>> On Thu, Aug 10, 2023 at 4:09 PM Yin, Fengwei <fengwei.yin@xxxxxxxxx> wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 8/11/2023 2:44 AM, Yu Zhao wrote:
> > >>>>> On Thu, Aug 10, 2023 at 3:58 AM Yin Fengwei <fengwei.yin@xxxxxxxxx> wrote:
> > >>>>>>
> > >>>>>> With mm-unstable branch, if trigger swap activity and it's possible
> > >>>>>> see following warning:
> > >>>>>> [ 178.093511][ T651] WARNING: CPU: 2 PID: 651 at mm/zswap.c:1387 zswap_load+0x67/0x570
> > >>>>>> [ 178.095155][ T651] Modules linked in:
> > >>>>>> [ 178.096103][ T651] CPU: 2 PID: 651 Comm: gmain Not tainted 6.5.0-rc4-00492-gad3232df3e41 #148
> > >>>>>> [ 178.098372][ T651] Hardware name: QEMU Standard PC (i440FX + PIIX,1996), BIOS 1.14.0-2 04/01/2014
> > >>>>>> [ 178.101114][ T651] RIP: 0010:zswap_load+0x67/0x570
> > >>>>>> [ 178.102359][ T651] Code: a0 78 4b 85 e8 ea db ff ff 48 8b 00 a8 01 0f 84 84 04 00 00 48 89 df e8 d7 db ff ff 48 8b 00 a9 00 00 08 00 0f 85 c4
> > >>>>>> [ 178.106376][ T651] RSP: 0018:ffffc900011b3760 EFLAGS: 00010246
> > >>>>>> [ 178.107675][ T651] RAX: 0017ffffc0080001 RBX: ffffea0004a991c0 RCX:ffffc900011b37dc
> > >>>>>> [ 178.109242][ T651] RDX: 0000000000000000 RSI: 0000000000000001 RDI:ffffea0004a991c0
> > >>>>>> [ 178.110916][ T651] RBP: ffffea0004a991c0 R08: 0000000000000243 R09:00000000c9a1aafc
> > >>>>>> [ 178.112377][ T651] R10: 00000000c9657db3 R11: 000000003c9657db R12:0000000000014b9c
> > >>>>>> [ 178.113698][ T651] R13: ffff88813501e710 R14: ffff88810d591000 R15:0000000000000000
> > >>>>>> [ 178.115008][ T651] FS: 00007fb21a9ff700(0000) GS:ffff88813bc80000(0000) knlGS:0000000000000000
> > >>>>>> [ 178.116423][ T651] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >>>>>> [ 178.117421][ T651] CR2: 00005632cbfc81f6 CR3: 0000000131450002 CR4:0000000000370ee0
> > >>>>>> [ 178.118683][ T651] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
> > >>>>>> [ 178.119894][ T651] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
> > >>>>>> [ 178.121087][ T651] Call Trace:
> > >>>>>> [ 178.121654][ T651] <TASK>
> > >>>>>> [ 178.122109][ T651] ? zswap_load+0x67/0x570
> > >>>>>> [ 178.122658][ T651] ? __warn+0x81/0x170
> > >>>>>> [ 178.123119][ T651] ? zswap_load+0x67/0x570
> > >>>>>> [ 178.123608][ T651] ? report_bug+0x167/0x190
> > >>>>>> [ 178.124150][ T651] ? handle_bug+0x3c/0x70
> > >>>>>> [ 178.124615][ T651] ? exc_invalid_op+0x13/0x60
> > >>>>>> [ 178.125192][ T651] ? asm_exc_invalid_op+0x16/0x20
> > >>>>>> [ 178.125753][ T651] ? zswap_load+0x67/0x570
> > >>>>>> [ 178.126231][ T651] ? lock_acquire+0xbb/0x290
> > >>>>>> [ 178.126745][ T651] ? folio_add_lru+0x40/0x1c0
> > >>>>>> [ 178.127261][ T651] ? find_held_lock+0x2b/0x80
> > >>>>>> [ 178.127776][ T651] swap_readpage+0xc7/0x5c0
> > >>>>>> [ 178.128273][ T651] do_swap_page+0x86d/0xf50
> > >>>>>> [ 178.128770][ T651] ? __pte_offset_map+0x3e/0x290
> > >>>>>> [ 178.129321][ T651] ? __pte_offset_map+0x1c4/0x290
> > >>>>>> [ 178.129883][ T651] __handle_mm_fault+0x6ad/0xca0
> > >>>>>> [ 178.130419][ T651] handle_mm_fault+0x18b/0x410
> > >>>>>> [ 178.130992][ T651] do_user_addr_fault+0x1f1/0x820
> > >>>>>> [ 178.132076][ T651] exc_page_fault+0x63/0x1a0
> > >>>>>> [ 178.132599][ T651] asm_exc_page_fault+0x22/0x30
> > >>>>>>
> > >>>>>> It's possible that swap_readpage() is called with none swapcache folio
> > >>>>>> in do_swap_page() and trigger this warning. So we shouldn't assume
> > >>>>>> zswap_load() always takes swapcache folio.
> > >>>>>
> > >>>>> Did you use a bdev with QUEUE_FLAG_SYNCHRONOUS? Otherwise it sounds
> > >>>>> like a bug to me.
> > >>>> I hit this warning with zram which has QUEUE_FLAG_SYNCHRONOUS set. Thanks.
> > >>>
> > >>> Does it make sense to keep the warning and instead change it to check
> > >>> SWP_SYNCHRONOUS_IO as well? Something like:
> > >>>
> > >>> VM_WARN_ON_ONCE(!folio_test_swapcache(folio) &&
> > >>> !swap_type_to_swap_info(type)->flags && SWP_SYNCHRONOUS_IO);
> > >>>
> > >>> Of course this is too ugly, so perhaps we want a helper to check if a
> > >>> swapfile is synchronous.
> > >> My understanding was that the WARN here is zswap_load() doesn't expect
> > >> a folio not in swapcache. With zram, swap_readpage() must accept the
> > >> folio not in swapcache. So this warn should not be there.
> > >>
> > >> But your comment make more sense to me. I will update the patch not
> > >> to remove this WARN. Thanks.
> > >
> > > That can cause another warning.
> > My understanding is that WARN may be wanted by zswap code.
> >
> > >
> > > Please don't overegineer.
>
> The original patch looks good to me. What Yosry suggested seems not
> only overengineered but also can cause a new KCSAN warning.

I suppose that can be easily mitigated with data_race(), similar to
do_swap_page().

Anyway, I don't feel strongly about it, if you do then we can go with
the current patch :)

It just feels odd to me to drop a warning from zswap due to an
interaction with zram, which should not be happening in practice.