Re: [PATCH net 2/2] auth_gss: Fix deadlock that blocks rpcsec_gss_exit_net when use-gss-proxy==1

From: Trond Myklebust
Date: Tue Sep 28 2021 - 09:30:24 EST

On Tue, 2021-09-28 at 11:14 +0800, Wang Hai wrote:
> When use-gss-proxy is set to 1, write_gssp() creates a rpc client in
> gssp_rpc_create(), this increases the netns refcount by 2, these
> refcounts are supposed to be released in rpcsec_gss_exit_net(), but
> it
> will never happen because rpcsec_gss_exit_net() is triggered only
> when
> the netns refcount gets to 0, specifically:
>     refcount=0 -> cleanup_net() -> ops_exit_list ->
> rpcsec_gss_exit_net
> It is a deadlock situation here, refcount will never get to 0 unless
> rpcsec_gss_exit_net() is called. So, in this case, the netns refcount
> should not be increased.
> In this case, xprt will take a netns refcount which is not supposed
> to be taken. Add a new flag to rpc_create_args called
> RPC_CLNT_CREATE_NO_NET_REF for not increasing the netns refcount.
> It is safe not to hold the netns refcount, because when
> cleanup_net(), it
> will hold the gssp_lock and then shut down the rpc client
> synchronously.
I don't like this solution at all. Adding this kind of flag is going to
lead to problems down the road.

Is there any reason whatsoever why we need this RPC client to exist
when there is no active knfsd server? IOW: Is there any reason why we
shouldn't defer creating this RPC client for when knfsd starts up in
this net namespace, and why we can't shut it down when knfsd shuts

Trond Myklebust
Linux NFS client maintainer, Hammerspace