Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter

From: K Prateek Nayak
Date: Tue Mar 25 2025 - 10:51:07 EST


Hello Oleg, Dominique,

On 3/25/2025 6:34 PM, Oleg Nesterov wrote:
On 03/25, K Prateek Nayak wrote:

I chased this down to p9_client_rpc() net/9p/client.c specifically:

err = c->trans_mod->request(c, req);
if (err < 0) {
/* write won't happen */
p9_req_put(c, req);
if (err != -ERESTARTSYS && err != -EFAULT)
c->status = Disconnected;
goto recalc_sigpending;
}

c->trans_mod->request() calls p9_fd_request() in net/9p/trans_fd.c
which basically does a p9_fd_poll().

Again, I know nothing about 9p... but if p9_fd_request() returns
an err < 0, then it comes from p9_conn->err and p9_fd_request()
does nothing else.

Previously, the above would fail with err as -EIO which would
cause the client to "Disconnect" and the retry logic would make
progress. Now however, the err returned is -ERESTARTSYS

OK... So p9_conn->err = -ERESTARTSYS was set by p9_conn_cancel()
called by p9_write_work() because pipe_write() returns ERESTARTSYS?

But I don't understand -EIO with the reverted commit aaec5a95d59615

Okay it is a long explanation. These are the relevant bits of
traces with aaec5a95d59615 reverted on mainline and I'll break it
down to the best of my abilities:

kworker/100:1-1803 [100] ..... 286.618784: p9_write_work: Data write wait 32770
kworker/100:1-1803 [100] ..... 286.618792: p9_write_work: Data written 28672
kworker/100:1-1803 [100] ..... 286.618793: p9_fd_poll: p9_fd_poll rd poll
kworker/100:1-1803 [100] ..... 286.618793: p9_fd_poll: p9_fd_request wr poll
kworker/100:1-1803 [100] ..... 286.618793: p9_fd_poll: p9_fd_poll rd poll
kworker/100:1-1803 [100] ..... 286.618793: p9_fd_poll: p9_fd_request wr poll
kworker/100:1-1803 [100] ..... 286.618795: p9_read_work: Data read wait 7
kworker/100:1-1803 [100] ..... 286.618796: p9_read_work: Data read 7
kworker/100:1-1803 [100] ..... 286.618796: p9_fd_poll: p9_fd_poll rd poll
kworker/100:1-1803 [100] ..... 286.618796: p9_fd_poll: p9_fd_request wr poll
kworker/100:1-1803 [100] ..... 286.618797: p9_fd_poll: p9_fd_poll rd poll
kworker/100:1-1803 [100] ..... 286.618797: p9_fd_poll: p9_fd_request wr poll
kworker/100:1-1803 [100] ..... 286.618797: p9_read_work: Data read wait 55
kworker/100:1-1803 [100] ..... 286.618798: p9_read_work: Data read 55
repro-4102 [100] ..... 286.618802: p9_client_write: p9_client_rpc done
repro-4102 [100] ..... 286.618804: p9_client_write: p9_pdup
repro-4102 [100] ..... 286.618804: p9_client_write: iter revert

==== the above bits are common between the two ===

repro-4102 [100] ..... 286.618805: p9_client_write: p9_client_rpc
repro-4102 [100] ..... 286.618817: p9_fd_request: p9_fd_request
repro-4102 [100] ..... 286.618818: p9_fd_request: p9_fd_request EPOLL
repro-4102 [100] ..... 286.618818: p9_fd_poll: p9_fd_poll rd poll
repro-4102 [100] ..... 286.618818: p9_fd_poll: p9_fd_request wr poll
kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_poll rd poll
kworker/100:1-1803 [100] ..... 286.618822: p9_fd_poll: p9_fd_request wr poll
kworker/100:1-1803 [100] ..... 286.618823: p9_read_work: Data read wait 7
kworker/100:1-1803 [100] ..... 286.618825: p9_read_work: Data read 7
kworker/100:1-1803 [100] ..... 286.618825: p9_read_work: p9_tag_lookup error

^^^^^^^^^^^^^^^^^^^
==== In this case the read succeeds but p9_tag_lookup() later will error out ====
==== Connection is cancelled with m->err as -EIO ====

kworker/100:1-1803 [100] ..... 286.618825: p9_read_work: Connection cancel
kworker/100:1-1803 [100] ...1. 286.618826: p9_conn_cancel: Connection cancelled (-5)
repro-4102 [100] ..... 286.618831: p9_client_write: p9_client_rpc done
repro-4102 [100] ..... 286.618832: p9_client_write: p9_client_rpc error (-5)

^^^
==== This error from kworker/100 is received by the reproducer ====

repro-4102 [100] ..... 286.618833: v9fs_issue_write: Issue write done 2 err(-5)
repro-4102 [100] ..... 286.618833: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated
repro-4102 [100] ..... 286.618834: netfs_wake_write_collector: Wake collector
repro-4102 [100] ..... 286.618834: netfs_wake_write_collector: Queuing collector work
repro-4102 [100] ..... 286.618837: v9fs_issue_write: Issue write subrequest terminated 2
repro-4102 [100] ..... 286.618838: netfs_unbuffered_write: netfs_unbuffered_write
repro-4102 [100] ..... 286.618838: netfs_end_issue_write: netfs_end_issue_write
repro-4102 [100] ..... 286.618838: netfs_end_issue_write: Write collector need poke 0
repro-4102 [100] ..... 286.618839: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!

==== A reissue mechanism triggers below ====

kworker/u1037:2-3232 [096] ..... 286.618845: netfs_retry_writes: netfs_reissue_write 1
kworker/u1037:2-3232 [096] ..... 286.618846: v9fs_issue_write: v9fs_issue_write
kworker/u1037:2-3232 [096] ..... 286.618847: p9_client_write: p9_client_rpc
kworker/u1037:2-3232 [096] ..... 286.618873: p9_fd_request: p9_fd_request
kworker/u1037:2-3232 [096] ...1. 286.618874: p9_fd_request: p9_fd_request error
kworker/u1037:2-3232 [096] ..... 286.618874: p9_client_rpc: Client disconnected (no write)

==== Connection with client is disconnected ====

kworker/u1037:2-3232 [096] ..... 286.618877: p9_client_write: p9_client_rpc done
kworker/u1037:2-3232 [096] ..... 286.618880: p9_client_write: p9_client_rpc error (-5)
kworker/u1037:2-3232 [096] ..... 286.618881: v9fs_issue_write: Issue write done 0 err(-5)
kworker/u1037:2-3232 [096] ..... 286.618882: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated
kworker/u1037:2-3232 [096] ..... 286.618882: netfs_wake_write_collector: Wake collector
kworker/u1037:2-3232 [096] ..... 286.618882: netfs_wake_write_collector: Queuing collector work
kworker/u1037:2-3232 [096] ..... 286.618887: v9fs_issue_write: Issue write subrequest terminated 0
kworker/u1037:2-3232 [096] ..... 286.618890: netfs_write_collection_worker: Write collect clearing and waking up!

==== The retry worker will clear the NETFS_RREQ_IN_PROGRESS bit and issue a wakeup ====

repro-4102 [100] ..... 286.619051: p9_client_prepare_req: p9_client_prepare_req eio 1
repro-4102 [100] ..... 286.619052: p9_client_rpc: p9_client_rpc early err return
repro-4108 [240] ..... 286.621325: p9_fd_create: Client connected

==== The reproducer will then re-establish the connection and the pattern repeats ====

---

So the pattern is with the optimization reverted is:

o The RPC thread reads "7" bytes.
o p9_tag_lookup() fails giving -EIO.
o connection is cancelled from rpc thread.

o The reproducer thread receives the packet and finds -EIO.
o The reproducer starts a Disconnect.
o The reproducer wakes up collector and sleeps on bit
NETFS_RREQ_IN_PROGRESS

o A retry worker wakes up.
o Finds connection has been disconnected
o Wakes up the reproducer thread.

o Reproducer wakes up and establishes connection.

I don't know 9p bits enough to understand the intermediates steps but
this is the rough flow.

---

Now for the traces on vanilla mainline:

kworker/31:1-1723 [031] ..... 95.011362: p9_write_work: Data write wait 32770
kworker/31:1-1723 [031] ..... 95.011372: p9_write_work: Data written 28672
kworker/31:1-1723 [031] ..... 95.011373: p9_fd_poll: p9_fd_poll rd poll
kworker/31:1-1723 [031] ..... 95.011373: p9_fd_poll: p9_fd_request wr poll
kworker/31:1-1723 [031] ..... 95.011374: p9_fd_poll: p9_fd_poll rd poll
kworker/31:1-1723 [031] ..... 95.011374: p9_fd_poll: p9_fd_request wr poll
kworker/31:1-1723 [031] ..... 95.011374: p9_read_work: Data read wait 7
kworker/31:1-1723 [031] ..... 95.011375: p9_read_work: Data read 7
kworker/31:1-1723 [031] ..... 95.011375: p9_fd_poll: p9_fd_poll rd poll
kworker/31:1-1723 [031] ..... 95.011375: p9_fd_poll: p9_fd_request wr poll
kworker/31:1-1723 [031] ..... 95.011376: p9_read_work: Data read wait 55
kworker/31:1-1723 [031] ..... 95.011376: p9_read_work: Data read 55
repro-4076 [031] ..... 95.011381: p9_client_rpc: Wait event killable (0)
repro-4076 [031] ..... 95.011382: p9_client_rpc: Check error (0)
repro-4076 [031] ..... 95.011382: p9_client_write: p9_client_rpc done
repro-4076 [031] ..... 95.011382: p9_client_write: p9_pdup
repro-4076 [031] ..... 95.011382: p9_client_write: iter revert

==== Above bits are same as the previous traces ====

repro-4076 [031] ..... 95.011383: p9_client_write: p9_client_rpc
repro-4076 [031] ..... 95.011393: p9_fd_request: p9_fd_request
repro-4076 [031] ..... 95.011394: p9_fd_request: p9_fd_request EPOLL
repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_poll rd poll
repro-4076 [031] ..... 95.011394: p9_fd_poll: p9_fd_request wr poll
repro-4076 [031] ..... 99.731970: p9_client_rpc: Wait event killable (-512)

==== The wait event here fails with -ERESTARTSYS ====

repro-4076 [031] ..... 99.731979: p9_client_rpc: Flushed (-512)
repro-4076 [031] ..... 99.731983: p9_client_rpc: Final error (-512)
repro-4076 [031] ..... 99.731983: p9_client_write: p9_client_rpc done
repro-4076 [031] ..... 99.731984: p9_client_write: p9_client_rpc error (-512)
repro-4076 [031] ..... 99.731985: v9fs_issue_write: Issue write done 2 err(-512)
repro-4076 [031] ..... 99.731987: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated
repro-4076 [031] ..... 99.731987: netfs_wake_write_collector: Wake collector
repro-4076 [031] ..... 99.731988: netfs_wake_write_collector: Queuing collector work
repro-4076 [031] ..... 99.731991: v9fs_issue_write: Issue write subrequest terminated 2
repro-4076 [031] ..... 99.731991: netfs_unbuffered_write: netfs_unbuffered_write
repro-4076 [031] ..... 99.731991: netfs_end_issue_write: netfs_end_issue_write
repro-4076 [031] ..... 99.731992: netfs_end_issue_write: Write collector need poke 0
repro-4076 [031] ..... 99.731993: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!

==== The reproducer waits on bit NETFS_RREQ_IN_PROGRESS ====

kworker/u1028:3-3860 [030] ..... 99.732028: netfs_retry_writes: netfs_reissue_write 1
kworker/u1028:3-3860 [030] ..... 99.732030: v9fs_issue_write: v9fs_issue_write
kworker/u1028:3-3860 [030] ..... 99.732031: p9_client_write: p9_client_rpc
kworker/u1028:3-3860 [030] ..... 99.732051: p9_fd_request: p9_fd_request
kworker/u1028:3-3860 [030] ..... 99.732052: p9_fd_request: p9_fd_request EPOLL
kworker/u1028:3-3860 [030] ..... 99.732052: p9_fd_poll: p9_fd_poll rd poll
kworker/u1028:3-3860 [030] ..... 99.732053: p9_fd_poll: p9_fd_request wr poll

==== The retry worker seemingly gets stuck post p9_fd_poll() waiting for wakeup ====

---

That is my analysis so far.


Oleg.


--
Thanks and Regards,
Prateek
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 32619d146cbc..3fbfc6882c35 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -56,10 +56,13 @@ static void v9fs_issue_write(struct netfs_io_subrequest *subreq)
struct p9_fid *fid = subreq->rreq->netfs_priv;
int err, len;

+ trace_printk("v9fs_issue_write\n");
len = p9_client_write(fid, subreq->start, &subreq->io_iter, &err);
+ trace_printk("Issue write done %d err(%d)\n", len, err);
if (len > 0)
__set_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
netfs_write_subrequest_terminated(subreq, len ?: err, false);
+ trace_printk("Issue write subrequest terminated %d\n", len);
}

/**
@@ -74,6 +77,7 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
int total, err;

total = p9_client_read(fid, pos, &subreq->io_iter, &err);
+ trace_printk("v9fs_issue_read %d err(%d)\n", total, err);

/* if we just extended the file size, any portion not in
* cache won't be on server and is zeroes */
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 42ce53cc216e..6333b8bb4008 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -105,6 +105,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *

if (!async) {
trace_netfs_rreq(wreq, netfs_rreq_trace_wait_ip);
+ trace_printk("Waiting on NETFS_RREQ_IN_PROGRESS!\n");
wait_on_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS,
TASK_UNINTERRUPTIBLE);
ret = wreq->error;
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 23c75755ad4e..a609063db8a7 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -446,6 +446,7 @@ static void netfs_read_collection(struct netfs_io_request *rreq)
task_io_account_read(rreq->transferred);

trace_netfs_rreq(rreq, netfs_rreq_trace_wake_ip);
+ trace_printk("Read collect clearing and waking up!\n");
clear_and_wake_up_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);

trace_netfs_rreq(rreq, netfs_rreq_trace_done);
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 3fca59e6475d..a54aa2bcc786 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -429,6 +429,7 @@ void netfs_write_collection_worker(struct work_struct *work)

_debug("finished");
trace_netfs_rreq(wreq, netfs_rreq_trace_wake_ip);
+ trace_printk("Write collect clearing and waking up!\n");
clear_and_wake_up_bit(NETFS_RREQ_IN_PROGRESS, &wreq->flags);

if (wreq->iocb) {
@@ -449,10 +450,14 @@ void netfs_write_collection_worker(struct work_struct *work)
*/
void netfs_wake_write_collector(struct netfs_io_request *wreq, bool was_async)
{
+ trace_printk("Wake collector\n");
if (!work_pending(&wreq->work)) {
netfs_get_request(wreq, netfs_rreq_trace_get_work);
- if (!queue_work(system_unbound_wq, &wreq->work))
+ trace_printk("Queuing collector work\n");
+ if (!queue_work(system_unbound_wq, &wreq->work)) {
+ trace_printk("Put request\n");
netfs_put_request(wreq, was_async, netfs_rreq_trace_put_work_nq);
+ }
}
}

@@ -542,8 +547,10 @@ void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
/* If we are at the head of the queue, wake up the collector,
* transferring a ref to it if we were the ones to do so.
*/
- if (list_is_first(&subreq->rreq_link, &stream->subrequests))
+ if (list_is_first(&subreq->rreq_link, &stream->subrequests)) {
+ trace_printk("Collector woken up from netfs_write_subrequest_terminated\n");
netfs_wake_write_collector(wreq, was_async);
+ }

netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated);
}
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 77279fc5b5a7..22a4b793e789 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -232,8 +232,10 @@ static void netfs_do_issue_write(struct netfs_io_stream *stream,

_enter("R=%x[%x],%zx", wreq->debug_id, subreq->debug_index, subreq->len);

- if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
+ if (test_bit(NETFS_SREQ_FAILED, &subreq->flags)) {
+ trace_printk("netfs_do_issue_write failed!\n");
return netfs_write_subrequest_terminated(subreq, subreq->error, false);
+ }

trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
stream->issue_write(subreq);
@@ -264,6 +266,7 @@ void netfs_issue_write(struct netfs_io_request *wreq,

if (!subreq)
return;
+ trace_printk("netfs_issue_write!\n");
stream->construct = NULL;
subreq->io_iter.count = subreq->len;
netfs_do_issue_write(stream, subreq);
@@ -290,6 +293,7 @@ size_t netfs_advance_write(struct netfs_io_request *wreq,
_enter("R=%x[%x]", wreq->debug_id, subreq ? subreq->debug_index : 0);

if (subreq && start != subreq->start + subreq->len) {
+ trace_printk("netfs_issue_write from advance write!\n");
netfs_issue_write(wreq, stream);
subreq = NULL;
}
@@ -307,6 +311,7 @@ size_t netfs_advance_write(struct netfs_io_request *wreq,
if (subreq->len >= stream->sreq_max_len ||
subreq->nr_segs >= stream->sreq_max_segs ||
to_eof) {
+ trace_printk("netfs_issue_write from advance write! 2\n");
netfs_issue_write(wreq, stream);
subreq = NULL;
}
@@ -397,6 +402,7 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
* when they were read. Note that these appear as a special
* write-back group.
*/
+ trace_printk("netfs_issue_write in folio copy\n");
if (fgroup == NETFS_FOLIO_COPY_TO_CACHE) {
netfs_issue_write(wreq, upload);
} else if (fgroup != wreq->group) {
@@ -526,6 +532,7 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
*/
static void netfs_end_issue_write(struct netfs_io_request *wreq)
{
+ trace_printk("netfs_end_issue_write\n");
bool needs_poke = true;

smp_wmb(); /* Write subreq lists before ALL_QUEUED. */
@@ -541,6 +548,7 @@ static void netfs_end_issue_write(struct netfs_io_request *wreq)
netfs_issue_write(wreq, stream);
}

+ trace_printk("Write collector need poke %d\n", needs_poke);
if (needs_poke)
netfs_wake_write_collector(wreq, false);
}
@@ -730,6 +738,7 @@ int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t
break;
}

+ trace_printk("netfs_unbuffered_write\n");
netfs_end_issue_write(wreq);
_leave(" = %d", error);
return error;
@@ -908,6 +917,7 @@ int netfs_writeback_single(struct address_space *mapping,
}

stop:
+ trace_printk("netfs_issue_write writeback single\n");
for (int s = 0; s < NR_IO_STREAMS; s++)
netfs_issue_write(wreq, &wreq->io_streams[s]);
smp_wmb(); /* Write lists before ALL_QUEUED. */
diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c
index 545d33079a77..60a03079766b 100644
--- a/fs/netfs/write_retry.c
+++ b/fs/netfs/write_retry.c
@@ -43,6 +43,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,

iov_iter_revert(&source, subreq->len - source.count);
netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+ trace_printk("netfs_reissue_write 1\n");
netfs_reissue_write(stream, subreq, &source);
}
}
@@ -116,6 +117,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
boundary = true;

netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+ trace_printk("netfs_reissue_write 2\n");
netfs_reissue_write(stream, subreq, &source);
if (subreq == to)
break;
@@ -183,6 +185,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
boundary = false;
}

+ trace_printk("netfs_reissue_write 3\n");
netfs_reissue_write(stream, subreq, &source);
if (!len)
break;
diff --git a/net/9p/client.c b/net/9p/client.c
index 09f8ced9f8bb..909c83b0e015 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -633,12 +633,16 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
p9_debug(P9_DEBUG_MUX, "client %p op %d\n", c, type);

/* we allow for any status other than disconnected */
- if (c->status == Disconnected)
+ if (c->status == Disconnected) {
+ trace_printk("p9_client_prepare_req eio 1\n");
return ERR_PTR(-EIO);
+ }

/* if status is begin_disconnected we allow only clunk request */
- if (c->status == BeginDisconnect && type != P9_TCLUNK)
+ if (c->status == BeginDisconnect && type != P9_TCLUNK) {
+ trace_printk("p9_client_prepare_req eio 2\n");
return ERR_PTR(-EIO);
+ }

va_copy(apc, ap);
req = p9_tag_alloc(c, type, t_size, r_size, fmt, apc);
@@ -690,8 +694,10 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
va_start(ap, fmt);
req = p9_client_prepare_req(c, type, tsize, rsize, fmt, ap);
va_end(ap);
- if (IS_ERR(req))
+ if (IS_ERR(req)) {
+ trace_printk("p9_client_rpc early err return\n");
return req;
+ }

req->tc.zc = false;
req->rc.zc = false;
@@ -706,9 +712,12 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
err = c->trans_mod->request(c, req);
if (err < 0) {
/* write won't happen */
+ trace_printk("write won't happnen (%d)\n", err);
p9_req_put(c, req);
- if (err != -ERESTARTSYS && err != -EFAULT)
+ if (err != -ERESTARTSYS && err != -EFAULT) {
+ trace_printk("Client disconnected (no write)\n");
c->status = Disconnected;
+ }
goto recalc_sigpending;
}
again:
@@ -721,6 +730,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
*/
smp_rmb();

+ trace_printk("Wait event killable (%d) (%d) (%d) (%d)\n", err, READ_ONCE(req->status), c->status, type);
if (err == -ERESTARTSYS && c->status == Connected &&
type == P9_TFLUSH) {
sigpending = 1;
@@ -731,6 +741,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
if (READ_ONCE(req->status) == REQ_STATUS_ERROR) {
p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
err = req->t_err;
+ trace_printk("REQ_STATUS_ERROR (%d)\n", err);
}
if (err == -ERESTARTSYS && c->status == Connected) {
p9_debug(P9_DEBUG_MUX, "flushing\n");
@@ -743,6 +754,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
/* if we received the response anyway, don't signal error */
if (READ_ONCE(req->status) == REQ_STATUS_RCVD)
err = 0;
+
+ trace_printk("Flushed (%d)\n", err);
}
recalc_sigpending:
if (sigpending) {
@@ -754,11 +767,13 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
goto reterr;

err = p9_check_errors(c, req);
+ trace_printk("Check error (%d)\n", err);
trace_9p_client_res(c, type, req->rc.tag, err);
if (!err)
return req;
reterr:
p9_req_put(c, req);
+ trace_printk("Final error (%d)\n", err);
return ERR_PTR(safe_errno(err));
}

@@ -808,8 +823,10 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
err = c->trans_mod->zc_request(c, req, uidata, uodata,
inlen, olen, in_hdrlen);
if (err < 0) {
- if (err == -EIO)
+ if (err == -EIO) {
+ trace_printk("Client disconnected (zero_copy)\n");
c->status = Disconnected;
+ }
if (err != -ERESTARTSYS)
goto recalc_sigpending;
}
@@ -1093,6 +1110,7 @@ EXPORT_SYMBOL(p9_client_destroy);
void p9_client_disconnect(struct p9_client *clnt)
{
p9_debug(P9_DEBUG_9P, "clnt %p\n", clnt);
+ trace_printk("Client disconnected\n");
clnt->status = Disconnected;
}
EXPORT_SYMBOL(p9_client_disconnect);
@@ -1100,6 +1118,7 @@ EXPORT_SYMBOL(p9_client_disconnect);
void p9_client_begin_disconnect(struct p9_client *clnt)
{
p9_debug(P9_DEBUG_9P, "clnt %p\n", clnt);
+ trace_printk("Client begin disconnect\n");
clnt->status = BeginDisconnect;
}
EXPORT_SYMBOL(p9_client_begin_disconnect);
@@ -1572,11 +1591,13 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
offset, rsize);
} else {
non_zc = 1;
+ trace_printk("RPC issue\n");
req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset,
rsize);
}
if (IS_ERR(req)) {
*err = PTR_ERR(req);
+ trace_printk("read RPC issue error(%d)\n", *err);
if (!non_zc)
iov_iter_revert(to, count - iov_iter_count(to));
return 0;
@@ -1587,12 +1608,14 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
if (*err) {
if (!non_zc)
iov_iter_revert(to, count - iov_iter_count(to));
+ trace_printk("p9pdu_readf error(%d)\n", *err);
trace_9p_protocol_dump(clnt, &req->rc);
p9_req_put(clnt, req);
return 0;
}
if (rsize < received) {
pr_err("bogus RREAD count (%d > %d)\n", received, rsize);
+ trace_printk("bogus read\n");
received = rsize;
}

@@ -1602,6 +1625,7 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
int n = copy_to_iter(dataptr, received, to);

if (n != received) {
+ trace_printk("copy_to_iter fault\n");
*err = -EFAULT;
p9_req_put(clnt, req);
return n;
@@ -1642,23 +1666,29 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
rsize, P9_ZC_HDR_SZ, "dqd",
fid->fid, offset, rsize);
} else {
+ trace_printk("p9_client_rpc\n");
req = p9_client_rpc(clnt, P9_TWRITE, "dqV", fid->fid,
offset, rsize, from);
+ trace_printk("p9_client_rpc done\n");
}
if (IS_ERR(req)) {
iov_iter_revert(from, count - iov_iter_count(from));
*err = PTR_ERR(req);
+ trace_printk("p9_client_rpc error (%d)\n", *err);
break;
}

+ trace_printk("p9_pdup\n");
*err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &written);
if (*err) {
+ trace_printk("p9_pdup err(%d)\n", *err);
iov_iter_revert(from, count - iov_iter_count(from));
trace_9p_protocol_dump(clnt, &req->rc);
p9_req_put(clnt, req);
break;
}
if (rsize < written) {
+ trace_printk("p9_pdup bogus %d %d\n", written, rsize);
pr_err("bogus RWRITE count (%d > %d)\n", written, rsize);
written = rsize;
}
@@ -1666,6 +1696,7 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", written);

p9_req_put(clnt, req);
+ trace_printk("iter revert\n");
iov_iter_revert(from, count - written - iov_iter_count(from));
total += written;
offset += written;
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 196060dc6138..3c83b3855e82 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -196,6 +196,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
return;
}

+ trace_printk("Connection cancelled (%d)\n", err);
m->err = err;

list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) {
@@ -233,9 +234,12 @@ p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
return EPOLLERR;
}

+ trace_printk("p9_fd_poll rd poll\n");
ret = vfs_poll(ts->rd, pt);
- if (ts->rd != ts->wr)
+ if (ts->rd != ts->wr) {
+ trace_printk("p9_fd_request wr poll\n");
ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN);
+ }
return ret;
}

@@ -263,9 +267,13 @@ static int p9_fd_read(struct p9_client *client, void *v, int len)
p9_debug(P9_DEBUG_ERROR, "blocking read ...\n");

pos = ts->rd->f_pos;
+ trace_printk("Data read wait %d\n", len);
ret = kernel_read(ts->rd, v, len, &pos);
- if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
+ trace_printk("Data read %d\n", ret);
+ if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) {
+ trace_printk("Client disconnected (fd_read)\n");
client->status = Disconnected;
+ }
return ret;
}

@@ -330,6 +338,7 @@ static void p9_read_work(struct work_struct *work)
if (!m->rreq || (m->rreq->status != REQ_STATUS_SENT)) {
p9_debug(P9_DEBUG_ERROR, "Unexpected packet tag %d\n",
m->rc.tag);
+ trace_printk("p9_tag_lookup error\n");
err = -EIO;
goto error;
}
@@ -338,6 +347,7 @@ static void p9_read_work(struct work_struct *work)
p9_debug(P9_DEBUG_ERROR,
"requested packet size too big: %d for tag %d with capacity %zd\n",
m->rc.size, m->rc.tag, m->rreq->rc.capacity);
+ trace_printk("capacity error\n");
err = -EIO;
goto error;
}
@@ -348,6 +358,7 @@ static void p9_read_work(struct work_struct *work)
m->rc.tag, m->rreq);
p9_req_put(m->client, m->rreq);
m->rreq = NULL;
+ trace_printk("sdata error\n");
err = -EIO;
goto error;
}
@@ -375,6 +386,7 @@ static void p9_read_work(struct work_struct *work)
p9_debug(P9_DEBUG_ERROR,
"Request tag %d errored out while we were reading the reply\n",
m->rc.tag);
+ trace_printk("request tag error %d status(%d)\n", m->rc.tag, m->rreq->status);
err = -EIO;
goto error;
}
@@ -403,6 +415,7 @@ static void p9_read_work(struct work_struct *work)

return;
error:
+ trace_printk("Connection cancel\n");
p9_conn_cancel(m, err);
clear_bit(Rworksched, &m->wsched);
}
@@ -429,9 +442,13 @@ static int p9_fd_write(struct p9_client *client, void *v, int len)
if (!(ts->wr->f_flags & O_NONBLOCK))
p9_debug(P9_DEBUG_ERROR, "blocking write ...\n");

+ trace_printk("Data write wait %d\n",len);
ret = kernel_write(ts->wr, v, len, &ts->wr->f_pos);
- if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
+ trace_printk("Data written %ld\n", ret);
+ if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) {
+ trace_printk("Client disconnected (fd_write)\n");
client->status = Disconnected;
+ }
return ret;
}

@@ -519,6 +536,7 @@ static void p9_write_work(struct work_struct *work)
return;

error:
+ trace_printk("Connection cancel\n");
p9_conn_cancel(m, err);
clear_bit(Wworksched, &m->wsched);
}
@@ -628,6 +646,7 @@ static void p9_poll_mux(struct p9_conn *m)
n = p9_fd_poll(m->client, NULL, &err);
if (n & (EPOLLERR | EPOLLHUP | EPOLLNVAL)) {
p9_debug(P9_DEBUG_TRANS, "error mux %p err %d\n", m, n);
+ trace_printk("Connection cancel\n");
p9_conn_cancel(m, err);
}

@@ -668,12 +687,14 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
struct p9_trans_fd *ts = client->trans;
struct p9_conn *m = &ts->conn;

+ trace_printk("p9_fd_request\n");
p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
m, current, &req->tc, req->tc.id);

spin_lock(&m->req_lock);

if (m->err < 0) {
+ trace_printk("p9_fd_request error\n");
spin_unlock(&m->req_lock);
return m->err;
}
@@ -682,13 +703,16 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
list_add_tail(&req->req_list, &m->unsent_req_list);
spin_unlock(&m->req_lock);

+ trace_printk("p9_fd_request EPOLL\n");
if (test_and_clear_bit(Wpending, &m->wsched))
n = EPOLLOUT;
else
n = p9_fd_poll(m->client, NULL, NULL);

- if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
+ if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) {
+ trace_printk("p9_fd_request schedule work\n");
schedule_work(&m->wq);
+ }

return 0;
}
@@ -852,6 +876,7 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
data_race(ts->wr->f_flags |= O_NONBLOCK);

client->trans = ts;
+ trace_printk("Client connected\n");
client->status = Connected;

return 0;
@@ -889,6 +914,7 @@ static int p9_socket_open(struct p9_client *client, struct socket *csocket)
get_file(file);
p->wr = p->rd = file;
client->trans = p;
+ trace_printk("Client connected\n");
client->status = Connected;

p->rd->f_flags |= O_NONBLOCK;
@@ -920,6 +946,7 @@ static void p9_conn_destroy(struct p9_conn *m)
m->wreq = NULL;
}

+ trace_printk("Connection destroy\n");
p9_conn_cancel(m, -ECONNRESET);

m->client = NULL;
@@ -942,6 +969,7 @@ static void p9_fd_close(struct p9_client *client)
if (!ts)
return;

+ trace_printk("Client disconnected (fd_close)\n");
client->status = Disconnected;

p9_conn_destroy(&ts->conn);