On Mon, Mar 24, 2025 at 3:52 PM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
So far, with tracing, this is where I'm:
o Mainline + Oleg's optimization reverted:
...
kworker/43:1-1723 [043] ..... 115.309065: p9_read_work: Data read wait 55
kworker/43:1-1723 [043] ..... 115.309066: p9_read_work: Data read 55
kworker/43:1-1723 [043] ..... 115.309067: p9_read_work: Data read wait 7
kworker/43:1-1723 [043] ..... 115.309068: p9_read_work: Data read 7
repro-4138 [043] ..... 115.309084: netfs_wake_write_collector: Wake collector
repro-4138 [043] ..... 115.309085: netfs_wake_write_collector: Queuing collector work
repro-4138 [043] ..... 115.309088: netfs_unbuffered_write: netfs_unbuffered_write
repro-4138 [043] ..... 115.309088: netfs_end_issue_write: netfs_end_issue_write
repro-4138 [043] ..... 115.309089: netfs_end_issue_write: Write collector need poke 0
repro-4138 [043] ..... 115.309091: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
kworker/u1030:1-1951 [168] ..... 115.309096: netfs_wake_write_collector: Wake collector
kworker/u1030:1-1951 [168] ..... 115.309097: netfs_wake_write_collector: Queuing collector work
kworker/u1030:1-1951 [168] ..... 115.309102: netfs_write_collection_worker: Write collect clearing and waking up!
... (syzbot reproducer continues)
o Mainline:
kworker/185:1-1767 [185] ..... 109.485961: p9_read_work: Data read wait 7
kworker/185:1-1767 [185] ..... 109.485962: p9_read_work: Data read 7
kworker/185:1-1767 [185] ..... 109.485962: p9_read_work: Data read wait 55
kworker/185:1-1767 [185] ..... 109.485963: p9_read_work: Data read 55
repro-4038 [185] ..... 114.225717: netfs_wake_write_collector: Wake collector
repro-4038 [185] ..... 114.225723: netfs_wake_write_collector: Queuing collector work
repro-4038 [185] ..... 114.225727: netfs_unbuffered_write: netfs_unbuffered_write
repro-4038 [185] ..... 114.225727: netfs_end_issue_write: netfs_end_issue_write
repro-4038 [185] ..... 114.225728: netfs_end_issue_write: Write collector need poke 0
repro-4038 [185] ..... 114.225728: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
... (syzbot reproducer hangs)
There is a third "kworker/u1030" component that never gets woken up for
reasons currently unknown to me with Oleg's optimization. I'll keep
digging.
Thanks for the update.
It is unclear to me if you checked, so I'm going to have to ask just
in case: when there is a hang, is there *anyone* stuck in pipe code
(and if so, where)?
You can get the kernel to print stacks for all threads with sysrq:
echo t > /proc/sysrq-trigger