Re: [PATCH] SUNRPC: Fix a race in xs_reset_transport
From: Jeff Layton
Date: Thu Sep 17 2015 - 10:59:19 EST
On Thu, 17 Sep 2015 10:50:01 -0400
Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
> On Thu, 2015-09-17 at 10:18 -0400, Jeff Layton wrote:
> > On Thu, 17 Sep 2015 09:38:33 -0400
> > Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
> >
> > > On Tue, Sep 15, 2015 at 2:52 PM, Jeff Layton <
> > > jlayton@xxxxxxxxxxxxxxx> wrote:
> > > > On Tue, 15 Sep 2015 16:49:23 +0100
> > > > "Suzuki K. Poulose" <suzuki.poulose@xxxxxxx> wrote:
> > > >
> > > > > net/sunrpc/xprtsock.c | 9 ++++++++-
> > > > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > > > > index 7be90bc..6f4789d 100644
> > > > > --- a/net/sunrpc/xprtsock.c
> > > > > +++ b/net/sunrpc/xprtsock.c
> > > > > @@ -822,9 +822,16 @@ static void xs_reset_transport(struct
> > > > > sock_xprt *transport)
> > > > > if (atomic_read(&transport->xprt.swapper))
> > > > > sk_clear_memalloc(sk);
> > > > >
> > > > > - kernel_sock_shutdown(sock, SHUT_RDWR);
> > > > > + if (sock)
> > > > > + kernel_sock_shutdown(sock, SHUT_RDWR);
> > > > >
> > > >
> > > > Good catch, but...isn't this still racy? What prevents transport
> > > > ->sock
> > > > being set to NULL after you assign it to "sock" but before
> > > > calling
> > > > kernel_sock_shutdown?
> > >
> > > The XPRT_LOCKED state.
> > >
> >
> > IDGI -- if the XPRT_LOCKED bit was supposed to prevent that, then
> > how could you hit the original race? There should be no concurrent
> > callers to xs_reset_transport on the same xprt, right?
>
> Correct. The only exception is xs_destroy.
>
> > AFAICT, that bit is not set in the xprt_destroy codepath, which may
> > be
> > the root cause of the problem. How would we take it there anyway?
> > xprt_destroy is void return, and may not be called in the context of
> > a
> > rpc_task. If it's contended, what do we do? Sleep until it's
> > cleared?
> >
>
> How about the following.
>
> 8<-----------------------------------------------------------------
> From e2e68218e66c6b0715fd6b8f1b3092694a7c0e62 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> Date: Thu, 17 Sep 2015 10:42:27 -0400
> Subject: [PATCH] SUNRPC: Fix races between socket connection and destroy code
>
> When we're destroying the socket transport, we need to ensure that
> we cancel any existing delayed connection attempts, and order them
> w.r.t. the call to xs_close().
>
> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> ---
> net/sunrpc/xprtsock.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 7be90bc1a7c2..d2dfbd043bea 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -881,8 +881,11 @@ static void xs_xprt_free(struct rpc_xprt *xprt)
> */
> static void xs_destroy(struct rpc_xprt *xprt)
> {
> + struct sock_xprt *transport = container_of(xprt,
> + struct sock_xprt, xprt);
> dprintk("RPC: xs_destroy xprt %p\n", xprt);
>
> + cancel_delayed_work_sync(&transport->connect_worker);
> xs_close(xprt);
> xs_xprt_free(xprt);
> module_put(THIS_MODULE);
Yeah, that looks like it might do it. The only other xs_destroy callers
are in the connect codepath so canceling the work should prevent the
race. So...
Acked-by: Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
It wouldn't hurt to update the comments over xs_close too for
posterity. They currently say:
* The caller _must_ be holding XPRT_LOCKED in order to avoid issues with
* xs_reset_transport() zeroing the socket from underneath a writer.
...but that rule is clearly broken here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/