Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
From: K Prateek Nayak
Date: Mon Mar 24 2025 - 10:54:17 EST
Hello folks,
Some updates.
On 3/24/2025 6:49 PM, K Prateek Nayak wrote:
Per syzbot this attempt did not work out either.
I think the blind stabs taken by everyone here are enough.
The report does not provide the crucial bit: what are the other
threads doing. Presumably someone else is stuck somewhere, possibly
not even in pipe code and that stuck thread was supposed to wake up
the one which trips over hung task detector. Figuring out what that
thread is imo the next step.
I failed to find a relevant command in
https://github.com/google/syzkaller/blob/master/docs/syzbot.md
So if you guys know someone on syzkaller side, maybe you can ask them
to tweak the report *or* maybe syzbot can test a "fix" which makes
hung task detector also report all backtraces? I don't know if that
can work, the output may be long enough that it will get trimmed by
something.
I don't have to time work on this for now, just throwing ideas.
I got the reproducer running locally. Tracing stuff currently to see
what is tripping. Will report back once I find something interesting.
Might take a while since the 9p bits are so far spread out.
So far, with tracing, this is where I'm:
o Mainline + Oleg's optimization reverted:
...
kworker/43:1-1723 [043] ..... 115.309065: p9_read_work: Data read wait 55
kworker/43:1-1723 [043] ..... 115.309066: p9_read_work: Data read 55
kworker/43:1-1723 [043] ..... 115.309067: p9_read_work: Data read wait 7
kworker/43:1-1723 [043] ..... 115.309068: p9_read_work: Data read 7
repro-4138 [043] ..... 115.309084: netfs_wake_write_collector: Wake collector
repro-4138 [043] ..... 115.309085: netfs_wake_write_collector: Queuing collector work
repro-4138 [043] ..... 115.309088: netfs_unbuffered_write: netfs_unbuffered_write
repro-4138 [043] ..... 115.309088: netfs_end_issue_write: netfs_end_issue_write
repro-4138 [043] ..... 115.309089: netfs_end_issue_write: Write collector need poke 0
repro-4138 [043] ..... 115.309091: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
kworker/u1030:1-1951 [168] ..... 115.309096: netfs_wake_write_collector: Wake collector
kworker/u1030:1-1951 [168] ..... 115.309097: netfs_wake_write_collector: Queuing collector work
kworker/u1030:1-1951 [168] ..... 115.309102: netfs_write_collection_worker: Write collect clearing and waking up!
... (syzbot reproducer continues)
o Mainline:
kworker/185:1-1767 [185] ..... 109.485961: p9_read_work: Data read wait 7
kworker/185:1-1767 [185] ..... 109.485962: p9_read_work: Data read 7
kworker/185:1-1767 [185] ..... 109.485962: p9_read_work: Data read wait 55
kworker/185:1-1767 [185] ..... 109.485963: p9_read_work: Data read 55
repro-4038 [185] ..... 114.225717: netfs_wake_write_collector: Wake collector
repro-4038 [185] ..... 114.225723: netfs_wake_write_collector: Queuing collector work
repro-4038 [185] ..... 114.225727: netfs_unbuffered_write: netfs_unbuffered_write
repro-4038 [185] ..... 114.225727: netfs_end_issue_write: netfs_end_issue_write
repro-4038 [185] ..... 114.225728: netfs_end_issue_write: Write collector need poke 0
repro-4038 [185] ..... 114.225728: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
... (syzbot reproducer hangs)
There is a third "kworker/u1030" component that never gets woken up for
reasons currently unknown to me with Oleg's optimization. I'll keep
digging.
--
Thanks and Regards,
Prateek