Re: [PATCH] net: 9p: fix possible refcount leak in p9_read_work() and recv_done()

From: asmadeus
Date: Mon Jul 11 2022 - 03:40:12 EST

Hangyu Hua wrote on Mon, Jul 11, 2022 at 02:59:07PM +0800:
> A ref got in p9_tag_lookup needs to be put when functions enter the
> error path.
> Fix this by adding p9_req_put in error path.

I wish it was that simple.

Did you actually observe a leak?

> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 8f8f95e39b03..c4ccb7b9e1bf 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -343,6 +343,7 @@ static void p9_read_work(struct work_struct *work)
> p9_debug(P9_DEBUG_ERROR,
> "No recv fcall for tag %d (req %p), disconnecting!\n",
> m->rc.tag, m->rreq);
> + p9_req_put(m->rreq);
> m->rreq = NULL;
> err = -EIO;
> goto error;
> @@ -372,6 +373,8 @@ static void p9_read_work(struct work_struct *work)
> "Request tag %d errored out while we were reading the reply\n",
> m->rc.tag);
> err = -EIO;
> + p9_req_put(m->rreq);
> + m->rreq = NULL;
> goto error;
> }
> spin_unlock(&m->client->lock);

for tcp, we still have that request in m->req_list, so we should be
calling p9_client_cb which will do the p9_req_put in p9_conn_cancel.

If you do it here, you'll get a refcount overflow and use after free.

> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> index 88e563826674..82b5d6894ee2 100644
> --- a/net/9p/trans_rdma.c
> +++ b/net/9p/trans_rdma.c
> @@ -317,6 +317,7 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
> /* Check that we have not yet received a reply for this request.
> */
> if (unlikely(req->rc.sdata)) {
> + p9_req_put(req);
> pr_err("Duplicate reply for request %d", tag);
> goto err_out;
> }

This one isn't as clear cut, I see that they put the client in a
FLUSHING state but nothing seems to acton on it... But if this happens
we're already in the use after free realm -- it means rc.sdata was
already set so the other thread could be calling p9_client_cb anytime if
it already hasn't, and yet another thread will then do the final ref put
and free this.
We shouldn't free this here as that would also be an overflow. The best
possible thing to do at this point is just to stop using that pointer.

If you actually run into a problem with these refcounts (should get a
warning on umount that something didn't get freed) then by all mean
let's look further into it, but please don't send such patches without
testing the error paths you're "fixing" -- I'm pretty sure a reproducer
to hit these paths would bark errors in dmesg as refcount has an
overflow check.