Any known soft lockup issue with vfs_write()->fsnotify()?
From: Dexuan Cui
Date: Fri Mar 02 2018 - 17:29:12 EST
Hi,
Recently people are getting a soft lock issue with vfs_write()->fsnotify().
The detailed calltrace is available at:
https://github.com/coreos/bugs/issues/2356
https://github.com/coreos/bugs/issues/2364
The kernel versions showing up the issue are:
4.14.11-coreos
4.14.19-coreos
4.13.0-1009 -- this is the kernel with which I'm personally seeing the lockup.
I have not got a chance to try the latest mainline kernel yet.
Before the lockup error message suddenly appears, Linux has been running fine for many hours.
I have NOT found a consistent way to reproduce the lockup yet.
Looks the kernel is stuck in fsnotify(), when it tries to get the fsnotify_mark_srcu lock.
"git log fs/notify/fsnotify.c" on the latest mainline shows that some recent patches might help.
I'd like to check if this is a known issue.
Looking forward to your insights!
Thanks,
-- Dexuan
For your convenience, this is a calltrace from the first link:
18h 30m 8.626s( 4ms): ip-172-45-43-199 login: [67361.641359] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [java:87260]
18h 42m 40.116s(751490ms): [67361.644600] Modules linked in: xfs xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_comment xt_mark veth nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c crc32c_generic vxlan ip6_udp_tunnel udp_tunnel overlay mousedev psmouse sb_edac i2c_piix4 i2c_core evdev edac_core button xenfs xen_privcmd sch_fq_codel nls_ascii nls_cp437 vfat fat dm_verity dm_bufio ext4 crc16 mbcache jbd2 fscrypto crc32c_intel ata_piix aesni_intel xen_blkfront libata aes_x86_64 crypto_simd cryptd glue_helper scsi_mod ixgbevf dm_mirror dm_region_hash dm_log dm_mod dax
18h 42m 40.142s( 26ms): [67361.668103] CPU: 10 PID: 87260 Comm: java Not tainted 4.14.11-coreos #1
18h 42m 40.142s( 0ms): [67361.670391] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
18h 42m 40.144s( 2ms): [67361.672581] task: ffff90d6009dbc80 task.stack: ffffb2388f704000
18h 42m 40.149s( 5ms): [67361.674604] RIP: 0010:fsnotify+0x166/0x520
18h 42m 40.149s( 0ms): [67361.675971] RSP: 0018:ffffb2388f707e10 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
18h 42m 40.150s( 1ms): [67361.678462] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
18h 42m 40.152s( 2ms): [67361.680986] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffffffff907294c0
18h 42m 40.157s( 5ms): [67361.683340] RBP: ffff90d2eddffed8 R08: 0000000000000000 R09: 0000000000000000
18h 42m 40.157s( 0ms): [67361.685709] R10: ffffdd941da7a100 R11: 0000000000000000 R12: ffff90d2eddfff00
18h 42m 40.159s( 2ms): [67361.688199] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
18h 42m 40.165s( 6ms): [67361.690579] FS: 00007f491c3f4700(0000) GS:ffff90d6ef880000(0000) knlGS:0000000000000000
18h 42m 40.165s( 0ms): [67361.693227] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
18h 42m 40.166s( 1ms): [67361.695206] CR2: 000000c421288300 CR3: 00000005ba5fc002 CR4: 00000000001606e0
18h 42m 40.175s( 9ms): [67361.697655] Call Trace:
18h 42m 40.175s( 0ms): [67361.698499] vfs_write+0x14f/0x1a0
18h 42m 40.175s( 0ms): [67361.699656] SyS_write+0x52/0xc0
18h 42m 40.175s( 0ms): [67361.700745] do_syscall_64+0x59/0x1c0
18h 42m 40.175s( 0ms): [67361.701996] entry_SYSCALL64_slow_path+0x25/0x25
18h 42m 40.175s( 0ms): [67361.703536] RIP: 0033:0x7f4b5566643d
18h 42m 40.176s( 1ms): [67361.704751] RSP: 002b:00007f491c3f0ef0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
18h 42m 40.179s( 3ms): [67361.707375] RAX: ffffffffffffffda RBX: 0000000000000032 RCX: 00007f4b5566643d
18h 42m 40.188s( 9ms): [67361.709849] RDX: 00000000000000f0 RSI: 00007f491c3f0f50 RDI: 0000000000000f40
18h 42m 40.188s( 0ms): [67361.712205] RBP: 00007f491c3f0f20 R08: 00007f491c3f1030 R09: 00000005f4ff70d8
18h 42m 40.188s( 0ms): [67361.714573] R10: 0000000000052f06 R11: 0000000000000293 R12: 00000000000000f0
18h 42m 40.188s( 0ms): [67361.716924] R13: 00007f491c3f0f50 R14: 0000000000000f40 R15: 0000000000000000
18h 42m 40.191s( 3ms): [67361.719331] Code: 40 4c 89 7c 24 48 4c 89 7c 24 08 8b 44 24 18 25 00 00 03 00 89 44 24 34 4d 85 e4 0f 95 c2 48 83 7c 24 08 00 0f 95 c1 89 c8 08 d0 <0f> 84 96 03 00 00 84 d2 0f 84 e8 02 00 00 48 8b 44 24 40 84 c9