Re: [syzbot] [xfs?] INFO: task hung in __fdget_pos (4)

From: Al Viro
Date: Sun Sep 03 2023 - 19:13:50 EST


On Mon, Sep 04, 2023 at 08:27:15AM +1000, Dave Chinner wrote:

> It already is (sysrq-t), but I'm not sure that will help - if it is
> a leaked unlock then nothing will show up at all.

Unlikely; grep and you'll see - very few callers, and for all of them
there's an fdput_pos() downstream of any fdget_pos() that had picked
non-NULL file reference.

In theory, it's not impossible that something had stripped FDPUT_POS_UNLOCK
from the flags, but that's basically "something might've corrupted the
local variables" scenario. There are 12 functions total where we might
be calling fdget_pos() and all of them are pretty small (1 in alpha
osf_sys.c, 6 in read_write.c and 5 in readdir.c); none of those takes
an address of struct fd, none of them has assignments to it after fdget_pos()
and the only accesses to its members are those to fd.file - all fetches.
Control flow is also easy to check - they are all short.

IMO it's much more likely that we'll find something like

thread A:
grabs some fs lock
gets stuck on something
thread B: write()
finds file
grabs ->f_pos_lock
calls into filesystem
blocks on fs lock held by A
thread C: read()/write()/lseek() on the same file
blocks on ->f_pos_lock