summary: regression: nfs mount (even idle) eventually hangs server
From: Mike Galbraith
Date: Thu Jan 05 2023 - 09:14:45 EST
Executive summary:
44df6f439a17 is the seeming culprit, as taking it and its 3959066b697b
dependency back to 6.1 introduces the problem there, and reverting same
in master cures it.
Reproducer:
mount box:/$pick_a_spot /mnt (server may even mount itself)
find /mnt -type f -exec md5sum {} \;
As that executes, run LTP testcases/bin/min_free_kbytes (or likely any
similar memory hog) on server. Abbreviated spew below should follow in
short order, and brick server.
Here it just works, real box, VM or even cute little raspberry pi 4b.
Abbreviated workqueue spew:
> [ 1171.959773] ------------[ cut here ]------------
> [ 1171.959792] WARNING: CPU: 4 PID: 81 at kernel/workqueue.c:1654 __queue_delayed_work+0x6a/0x90
> [ 1171.959804] Modules linked in: netconsole(E) af_packet(E) hid_logitech_hidpp(E) joydev(E) usblp(E) hid_logitech_dj(E) ip6table_mangle(E) ip6table_raw(E) iptable_raw(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) nfnetlink(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) bpfilter(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) ledtrig_audio(E) coretemp(E) snd_hda_codec_hdmi(E) iTCO_wdt(E) snd_hda_intel(E) at24(E) nls_iso8859_1(E) kvm_intel(E) intel_pmc_bxt(E) snd_intel_dspcfg(E) regmap_i2c(E) mei_hdcp(E) nls_cp437(E) iTCO_vendor_support(E) snd_hda_codec(E) snd_hwdep(E) kvm(E) snd_hda_core(E) irqbypass(E) r8169(E) pcspkr(E) snd_pcm(E) realtek(E) i2c_i801(E) mei_me(E) snd_timer(E) mdio_devres(E) snd(E) lpc_ich(E) i2c_smbus(E) libphy(E) soundcore(E) mfd_core(E) mei(E) fan(E) thermal(E) intel_smartconnect(E) nfsd(E) auth_rpcgss(E) nfs_acl(E)
> [ 1171.959869] lockd(E) sch_fq_codel(E) grace(E) fuse(E) sunrpc(E) configfs(E) ip_tables(E) x_tables(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) drm_ttm_helper(E) ttm(E) i2c_algo_bit(E) drm_display_helper(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drm_kms_helper(E) sha512_ssse3(E) ahci(E) sha512_generic(E) xhci_pci(E) syscopyarea(E) aesni_intel(E) sysfillrect(E) libahci(E) ehci_pci(E) sysimgblt(E) crypto_simd(E) xhci_hcd(E) ehci_hcd(E) cryptd(E) drm(E) libata(E) cec(E) usbcore(E) usb_common(E) rc_core(E) video(E) wmi(E) button(E) sd_mod(E) t10_pi(E) crc64_rocksoft_generic(E) crc64_rocksoft(E) crc64(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) virtio(E) virtio_ring(E) ext4(E) crc32c_intel(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) msr(E) efivarfs(E) autofs4(E)
> [ 1171.959962] CPU: 4 PID: 81 Comm: kswapd0 Kdump: loaded Tainted: G E 6.1.0.g6feb57c-master #41
> [ 1171.959969] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [ 1171.959972] RIP: 0010:__queue_delayed_work+0x6a/0x90
> [ 1171.959981] Code: 50 48 01 c1 83 ff 08 48 89 4a 30 75 2c 4c 89 c7 e9 1b b1 07 00 e9 46 e8 ff ff 0f 0b eb cc 0f 0b 48 81 7a 38 20 32 0a 81 74 aa <0f> 0b 48 8b 42 28 48 85 c0 74 a8 0f 0b eb a4 89 fe 4c 89 c7 e9 1d
> [ 1171.959988] RSP: 0018:ffff8881010a7c78 EFLAGS: 00010003
> [ 1171.959992] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1171.959997] RDX: ffff88818016b748 RSI: ffff88810a0bae00 RDI: 0000000000000008
> [ 1171.960001] RBP: ffff88818016b748 R08: 0000000000000000 R09: 0000000000000000
> [ 1171.960005] R10: 0000000000000000 R11: 000000000354f7fb R12: 0000000000000008
> [ 1171.960010] R13: ffff88810a0bae00 R14: 0000000000000000 R15: ffff88818016b710
> [ 1171.960015] FS: 0000000000000000(0000) GS:ffff88840ed00000(0000) knlGS:0000000000000000
> [ 1171.960019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1171.960022] CR2: 00007f39558dc000 CR3: 00000001b893a002 CR4: 00000000001706e0
> [ 1171.960026] Call Trace:
> [ 1171.960033] <TASK>
> [ 1171.960038] mod_delayed_work_on+0x49/0x70
> [ 1171.960049] nfsd4_state_shrinker_count+0x24/0x50 [nfsd]
> [ 1171.960113] shrink_slab.constprop.94+0x9d/0x370
> [ 1171.960122] shrink_node+0x1c5/0x420
> [ 1171.960129] balance_pgdat+0x25f/0x530
> [ 1171.960137] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 1171.960143] kswapd+0x12c/0x360
> [ 1171.960149] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 1171.960155] ? __pfx_kswapd+0x10/0x10
> [ 1171.960161] kthread+0xc0/0xe0
> [ 1171.960167] ? __pfx_kthread+0x10/0x10
> [ 1171.960172] ret_from_fork+0x29/0x50
> [ 1171.960180] </TASK>
> [ 1171.960184] ---[ end trace 0000000000000000 ]---
> [ 1171.960190] ------------[ cut here ]------------
> [ 1171.960193] WARNING: CPU: 4 PID: 81 at kernel/workqueue.c:1656 __queue_delayed_work+0x5a/0x90
> [ 1171.960202] Modules linked in: netconsole(E) af_packet(E) hid_logitech_hidpp(E) joydev(E) usblp(E) hid_logitech_dj(E) ip6table_mangle(E) ip6table_raw(E) iptable_raw(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) nfnetlink(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) bpfilter(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) ledtrig_audio(E) coretemp(E) snd_hda_codec_hdmi(E) iTCO_wdt(E) snd_hda_intel(E) at24(E) nls_iso8859_1(E) kvm_intel(E) intel_pmc_bxt(E) snd_intel_dspcfg(E) regmap_i2c(E) mei_hdcp(E) nls_cp437(E) iTCO_vendor_support(E) snd_hda_codec(E) snd_hwdep(E) kvm(E) snd_hda_core(E) irqbypass(E) r8169(E) pcspkr(E) snd_pcm(E) realtek(E) i2c_i801(E) mei_me(E) snd_timer(E) mdio_devres(E) snd(E) lpc_ich(E) i2c_smbus(E) libphy(E) soundcore(E) mfd_core(E) mei(E) fan(E) thermal(E) intel_smartconnect(E) nfsd(E) auth_rpcgss(E) nfs_acl(E)
> [ 1171.960259] lockd(E) sch_fq_codel(E) grace(E) fuse(E) sunrpc(E) configfs(E) ip_tables(E) x_tables(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) drm_ttm_helper(E) ttm(E) i2c_algo_bit(E) drm_display_helper(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drm_kms_helper(E) sha512_ssse3(E) ahci(E) sha512_generic(E) xhci_pci(E) syscopyarea(E) aesni_intel(E) sysfillrect(E) libahci(E) ehci_pci(E) sysimgblt(E) crypto_simd(E) xhci_hcd(E) ehci_hcd(E) cryptd(E) drm(E) libata(E) cec(E) usbcore(E) usb_common(E) rc_core(E) video(E) wmi(E) button(E) sd_mod(E) t10_pi(E) crc64_rocksoft_generic(E) crc64_rocksoft(E) crc64(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) virtio(E) virtio_ring(E) ext4(E) crc32c_intel(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) msr(E) efivarfs(E) autofs4(E)
> [ 1171.960352] CPU: 4 PID: 81 Comm: kswapd0 Kdump: loaded Tainted: G W E 6.1.0.g6feb57c-master #41
> [ 1171.960358] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [ 1171.960363] RIP: 0010:__queue_delayed_work+0x5a/0x90
> [ 1171.960369] Code: 8b 05 0a 47 16 01 4c 8d 42 20 48 89 72 48 89 7a 50 48 01 c1 83 ff 08 48 89 4a 30 75 2c 4c 89 c7 e9 1b b1 07 00 e9 46 e8 ff ff <0f> 0b eb cc 0f 0b 48 81 7a 38 20 32 0a 81 74 aa 0f 0b 48 8b 42 28
> [ 1171.960377] RSP: 0018:ffff8881010a7c78 EFLAGS: 00010003
> [ 1171.960383] RAX: ffff88818016b750 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1171.960388] RDX: ffff88818016b748 RSI: ffff88810a0bae00 RDI: 0000000000000008
> [ 1171.960394] RBP: ffff88818016b748 R08: 0000000000000000 R09: 0000000000000000
> [ 1171.960453] R10: 0000000000000000 R11: 000000000354f7fb R12: 0000000000000008
> [ 1171.960459] R13: ffff88810a0bae00 R14: 0000000000000000 R15: ffff88818016b710
> [ 1171.960463] FS: 0000000000000000(0000) GS:ffff88840ed00000(0000) knlGS:0000000000000000
> [ 1171.960468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1171.960473] CR2: 00007f39558dc000 CR3: 00000001b893a002 CR4: 00000000001706e0
> [ 1171.960478] Call Trace:
> [ 1171.960481] <TASK>
> [ 1171.960484] mod_delayed_work_on+0x49/0x70
> [ 1171.960492] nfsd4_state_shrinker_count+0x24/0x50 [nfsd]
> [ 1171.960542] shrink_slab.constprop.94+0x9d/0x370
> [ 1171.960550] shrink_node+0x1c5/0x420
> [ 1171.960557] balance_pgdat+0x25f/0x530
> [ 1171.960564] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 1171.960571] kswapd+0x12c/0x360
> [ 1171.960578] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 1171.960583] ? __pfx_kswapd+0x10/0x10
> [ 1171.960589] kthread+0xc0/0xe0
> [ 1171.960594] ? __pfx_kthread+0x10/0x10
> [ 1171.960599] ret_from_fork+0x29/0x50
> [ 1171.960607] </TASK>
> [ 1171.960611] ---[ end trace 0000000000000000 ]---
> [ 1171.960617] ------------[ cut here ]------------
> [ 1171.960620] WARNING: CPU: 4 PID: 81 at kernel/workqueue.c:1499 __queue_work+0x33b/0x3d0
> [ 1171.960628] Modules linked in: netconsole(E) af_packet(E) hid_logitech_hidpp(E) joydev(E) usblp(E) hid_logitech_dj(E) ip6table_mangle(E) ip6table_raw(E) iptable_raw(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) nfnetlink(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) bpfilter(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) ledtrig_audio(E) coretemp(E) snd_hda_codec_hdmi(E) iTCO_wdt(E) snd_hda_intel(E) at24(E) nls_iso8859_1(E) kvm_intel(E) intel_pmc_bxt(E) snd_intel_dspcfg(E) regmap_i2c(E) mei_hdcp(E) nls_cp437(E) iTCO_vendor_support(E) snd_hda_codec(E) snd_hwdep(E) kvm(E) snd_hda_core(E) irqbypass(E) r8169(E) pcspkr(E) snd_pcm(E) realtek(E) i2c_i801(E) mei_me(E) snd_timer(E) mdio_devres(E) snd(E) lpc_ich(E) i2c_smbus(E) libphy(E) soundcore(E) mfd_core(E) mei(E) fan(E) thermal(E) intel_smartconnect(E) nfsd(E) auth_rpcgss(E) nfs_acl(E)
> [ 1171.960687] lockd(E) sch_fq_codel(E) grace(E) fuse(E) sunrpc(E) configfs(E) ip_tables(E) x_tables(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) drm_ttm_helper(E) ttm(E) i2c_algo_bit(E) drm_display_helper(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drm_kms_helper(E) sha512_ssse3(E) ahci(E) sha512_generic(E) xhci_pci(E) syscopyarea(E) aesni_intel(E) sysfillrect(E) libahci(E) ehci_pci(E) sysimgblt(E) crypto_simd(E) xhci_hcd(E) ehci_hcd(E) cryptd(E) drm(E) libata(E) cec(E) usbcore(E) usb_common(E) rc_core(E) video(E) wmi(E) button(E) sd_mod(E) t10_pi(E) crc64_rocksoft_generic(E) crc64_rocksoft(E) crc64(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) virtio(E) virtio_ring(E) ext4(E) crc32c_intel(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) msr(E) efivarfs(E) autofs4(E)
> [ 1171.960775] CPU: 4 PID: 81 Comm: kswapd0 Kdump: loaded Tainted: G W E 6.1.0.g6feb57c-master #41
> [ 1171.960781] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [ 1171.960787] RIP: 0010:__queue_work+0x33b/0x3d0
> [ 1171.960793] Code: 25 40 d3 02 00 f6 47 2c 20 74 18 e8 6f 6f 00 00 48 85 c0 74 0e 48 8b 40 20 48 3b 68 08 0f 84 f5 fc ff ff 0f 0b e9 fe fd ff ff <0f> 0b e9 ee fd ff ff 83 c9 02 49 8d 57 68 e9 d7 fd ff ff 80 3d f3
> [ 1171.960801] RSP: 0018:ffff8881010a7c38 EFLAGS: 00010003
> [ 1171.960807] RAX: ffff88818016b750 RBX: ffffffff81fcc880 RCX: 0000000000000000
> [ 1171.960811] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888100078000
> [ 1171.960816] RBP: ffff88810a0bae00 R08: ffff888100400028 R09: ffff888100400000
> [ 1171.960821] R10: 0000000000000000 R11: ffffffff8225d5c8 R12: 0000000000000008
> [ 1171.960826] R13: 0000000000000004 R14: ffff88818016b748 R15: ffff888120a5e000
> [ 1171.960831] FS: 0000000000000000(0000) GS:ffff88840ed00000(0000) knlGS:0000000000000000
> [ 1171.960838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1171.960842] CR2: 00007f39558dc000 CR3: 00000001b893a002 CR4: 00000000001706e0
> [ 1171.960848] Call Trace:
> [ 1171.960853] <TASK>
> [ 1171.960857] mod_delayed_work_on+0x49/0x70
> [ 1171.960864] nfsd4_state_shrinker_count+0x24/0x50 [nfsd]
> [ 1171.960912] shrink_slab.constprop.94+0x9d/0x370
> [ 1171.960919] shrink_node+0x1c5/0x420
> [ 1171.960926] balance_pgdat+0x25f/0x530
> [ 1171.960932] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 1171.960939] kswapd+0x12c/0x360
> [ 1171.960945] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 1171.960950] ? __pfx_kswapd+0x10/0x10
> [ 1171.960956] kthread+0xc0/0xe0
> [ 1171.960961] ? __pfx_kthread+0x10/0x10
> [ 1171.960965] ret_from_fork+0x29/0x50
> [ 1171.960973] </TASK>
> [ 1171.960976] ---[ end trace 0000000000000000 ]---