Re: Still a pretty bad time on 5.4.6 with fuse_request_end.

From: michael+lkml
Date: Sun Feb 09 2020 - 03:09:33 EST


From: Michael Stapelberg <michael+lkml@xxxxxxxxxxxxx>

Hey,

I recently ran into this, too. The symptom for me is that processes using the
affected FUSE file system hang indefinitely, sync(2) system calls hang
indefinitely, and even triggering an abort via echo 1 >
/sys/fs/fuse/connections/*/abort does not get the file system unstuck (there is
always 1 request still pending). Only removing power will get the machine
unstuck.

Iâm triggering this when building packages for https://distr1.org/, which uses a
FUSE daemon (written in Go using the jacobsa/fuse package) to provide package
contents.

I bisected the issue to commit
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2b319d1f6f92a4ced9897678113d176ee16ae85d

With that commit, I run into a kernel oops within â1 minute after starting my
batch build. With the commit before, I can batch build for many minutes without
issues.

âand just in case it matters, building linux from HEAD
(f757165705e92db62f85a1ad287e9251d1f2cd82) with that commit reverted results in
a working kernel, too.

Find below a backtrace full from kgdb, with fs/fuse/dev.c compiled with -O0:

(gdb) bt full
#0 0xffff888139a36600 in ?? ()
No symbol table info available.
#1 0xffffffff8137b368 in fuse_request_end (fc=0xffff888139a36600,
req=0xffff8880b7f333b8) at fs/fuse/dev.c:328
fiq = 0xffff888139a36648
async = true
#2 0xffffffff8137f488 in fuse_dev_do_write (fud=0xffff888139a36600,
cs=0xffffc9000dd7fa58, nbytes=4294967294) at fs/fuse/dev.c:1911
err = 0
fc = 0xffff888139a36600
fpq = 0xffff8881390e5148
req = 0xffff8880b7f333b8
oh = {len = 16, error = -2, unique = 2692038}
#3 0xffffffff8137f569 in fuse_dev_write (iocb=0xffffc9000093be48,
from=0xffffc9000093be20) at fs/fuse/dev.c:1933
cs = {write = 0, req = 0xffff8880b7f333b8, iter =
0xffffc9000093be20, pipebufs = 0x0 <fixed_percpu_data>, currbuf = 0x0
<fixed_percpu_data>, pipe = 0x0 <fixed_percpu_data>, nr_segs = 0,
pg = 0x0 <fixed_percpu_data>, len = 0, offset = 24, move_pages = 0}
fud = 0xffff8881390e5140
#4 0xffffffff811fe4de in call_write_iter (file=<optimized out>,
iter=<optimized out>, kio=<optimized out>) at
./include/linux/fs.h:1902
No locals.
#5 new_sync_write (filp=0xffff888123800800, buf=<optimized out>,
len=<optimized out>, ppos=0xffffc9000093bee8) at fs/read_write.c:483
iov = {iov_base = 0xc00082a008, iov_len = 16}
kiocb = {ki_filp = 0xffff888123800800, ki_pos = 0, ki_complete
= 0x0 <fixed_percpu_data>, private = 0x0 <fixed_percpu_data>, ki_flags
= 0, ki_hint = 0, ki_ioprio = 0, ki_cookie = 0}
iter = {type = 5, iov_offset = 0, count = 0, {iov =
0xffffc9000093be20, kvec = 0xffffc9000093be20, bvec =
0xffffc9000093be20, pipe = 0xffffc9000093be20}, {nr_segs = 0, {head =
0,
start_head = 0}}}
ret = <optimized out>
#6 0xffffffff811fe594 in __vfs_write (file=<optimized out>,
p=<optimized out>, count=<optimized out>, pos=<optimized out>) at
fs/read_write.c:496
No locals.
#7 0xffffffff81200fa4 in vfs_write (pos=<optimized out>, count=16,
buf=<optimized out>, file=<optimized out>) at fs/read_write.c:558
ret = 16
ret = <optimized out>
#8 vfs_write (file=0xffff888123800800, buf=0xc00082a008 "\020",
count=16, pos=0xffffc9000093bee8) at fs/read_write.c:542
ret = 16
#9 0xffffffff81201252 in ksys_write (fd=<optimized out>,
buf=0xc00082a008 "\020", count=16) at fs/read_write.c:611
pos = 0
ppos = <optimized out>
f = <optimized out>
ret = 824642281480
#10 0xffffffff812012e5 in __do_sys_write (count=<optimized out>,
buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:623
No locals.
#11 __se_sys_write (count=<optimized out>, buf=<optimized out>,
fd=<optimized out>) at fs/read_write.c:620
ret = <optimized out>
ret = <optimized out>
#12 __x64_sys_write (regs=<optimized out>) at fs/read_write.c:620
No locals.
#13 0xffffffff810027f8 in do_syscall_64 (nr=<optimized out>,
regs=0xffffc9000093bf58) at arch/x86/entry/common.c:294
ti = <optimized out>
#14 0xffffffff81e0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
No locals.
#15 0x0000000000000000 in ?? ()
No symbol table info available.