Re: [syzbot] [xfs?] INFO: task hung in __fdget_pos (4)

From: Al Viro
Date: Sun Sep 03 2023 - 23:02:43 EST


On Mon, Sep 04, 2023 at 11:45:03AM +1000, Dave Chinner wrote:

> > thread B: write()
> > finds file
> > grabs ->f_pos_lock
> > calls into filesystem
> > blocks on fs lock held by A
> > thread C: read()/write()/lseek() on the same file
> > blocks on ->f_pos_lock
>
> Yes, that's exactly what I said in a followup email - we need to
> know what happened to thread A, because that might be where we are
> stuck on a leaked lock.
>
> I saw quite a few reports where lookup/readdir are also stuck trying
> to get an inode lock - those at the "thread B"s in the above example
> - but there's no indication left of what happened with thread A.
>
> If thread A was blocked iall that time on something, then the hung
> task timer should fire on it, too. If it is running in a tight
> loop, the NMI would have dumped a stack trace from it.
>
> But neither of those things happened, so it's either leaked
> something or it's in a loop with a short term sleep so doesn't
> trigger the hung task timer. sysrq-w output will capture that
> without all the noise of sysrq-t....

Here's what brought sysrq-t:

| > The report does not have info necessary to figure this out -- no
| > backtrace for whichever thread which holds f_pos_lock. I clicked on a
| > bunch of other reports and it is the same story.
| >
| > Can the kernel be configured to dump backtraces from *all* threads?
| >
| > If there is no feature like that I can hack it up.
|
| <break>t
|
| over serial console, or echo t >/proc/sysrq-trigger would do it...

A question specifically about getting the stack traces...