Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter
From: Oleg Nesterov
Date: Wed Mar 26 2025 - 08:23:33 EST
On 03/25, Dominique Martinet wrote:
>
> Thanks for the traces.
>
> w/ revert
> K Prateek Nayak wrote on Tue, Mar 25, 2025 at 08:19:26PM +0530:
> > kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_poll rd poll
> > kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_request wr poll
> > kworker/100:1-1803 [100] ..... 286.618823: p9_read_work: Data read wait 7
>
> new behavior
> > repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_poll rd poll
> > repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_request wr poll
> > repro-4076 [031] ..... 99.731970: p9_client_rpc: Wait event killable (-512)
>
> For me the problem isn't so much that this gets ERESTARTSYS but that it
> nevers gets to read the 7 bytes that are available?
Yes...
OK, lets first recall what the commit aaec5a95d59615523 ("pipe_read:
don't wake up the writer if the pipe is still full") does.
It simply removes the unnecessary/spurious wakeups when the writer
can't add more data to the pipe.
See the "stupid test-cas" in
https://lore.kernel.org/all/20250120144338.GC7432@xxxxxxxxxx/
In particular this note:
As you can see, without this patch pipe_read() wakes the writer up
4095 times for no reason, the writer burns a bit of CPU and blocks
again after wakeup until the last read(fd[0], &c, 1).
in this test-case the writer sleeps in pipe_write(), but the same is true
for the task sleeping in poll( { .fd = pipe_fd, .events = POLLOUT}, ...).
Now, after some grepping I have found
static void p9_conn_create(struct p9_client *client)
{
...
init_poll_funcptr(&m->pt, p9_pollwait);
n = p9_fd_poll(client, &m->pt, NULL);
...
}
So, iiuc, in this case p9_fd_poll(&m->pt /* != NULL */) -> p9_pollwait()
paths will add the "dummy" pwait->wait entries with ->func = p9_pollwake
to pipe_inode_info.rd_wait and pipe_inode_info.wr_wait.
Hmm... I don't understand why the 2nd vfs_poll(ts->wr) depends on the
ret from vfs_poll(ts->rd), but I assume this is correct.
This means that every time pipe_read() does wake_up(&pipe->wr_wait)
p9_pollwake() is called. This function kicks p9_poll_workfn() which
calls p9_poll_mux() which calls p9_fd_poll() again with pt == NULL.
In this case the conditional vfs_poll(ts->wr) looks more understandable...
So. Without the commit above, p9_poll_mux()->p9_fd_poll() can be called
much more often and, in particular, can report the "additional" EPOLLIN.
Can this somehow explain the problem?
Oleg.