Re: [PATCH 0/3] auth_gss: netns refcount leaks when use-gss-proxy==1

From: Wenbin Zeng
Date: Fri May 10 2019 - 01:11:12 EST


On Thu, May 09, 2019 at 04:52:18PM -0400, J. Bruce Fields wrote:
> Thanks for figuring this out!
>
> I guess I'll take these patches (with the one fix in your response to
> Al) through the nfsd tree, unless someone tells me otherwise. (The
> original bug was introduced through nfsd.)

Thank you, Bruce.
I am submitting v2 with that fix right away.

>
> How serious are the consequences of the leak? I'm wondering if it's
> worth a stable cc or not.

Though the leak only happens with _privileged_ docker containers that have
gssproxy service enabled and use-gss-proxy set to 1, the consequences
can be ugly, the killed/stopped containers not only leave struct net
unfreed, also possibly leave behind veth devices linked to the netns, in
environments that containers are frequently killed/stopped, it is quite
ugly.

>
> --b.
>
> On Wed, May 01, 2019 at 02:42:22PM +0800, Wenbin Zeng wrote:
> > This patch series fixes an auth_gss bug that results in netns refcount leaks when use-gss-proxy is set to 1.
> >
> > The problem was found in privileged docker containers with gssproxy service enabled and /proc/net/rpc/use-gss-proxy set to 1, the corresponding struct net->count ends up at 2 after container gets killed, the consequence is that the struct net cannot be freed.
> >
> > It turns out that write_gssp() called gssp_rpc_create() to create a rpc client, this increases net->count by 2; rpcsec_gss_exit_net() is supposed to decrease net->count but it never gets called because its call-path is:
> > net->count==0 -> cleanup_net -> ops_exit_list -> rpcsec_gss_exit_net
> > Before rpcsec_gss_exit_net() gets called, net->count cannot reach 0, this is a deadlock situation.
> >
> > To fix the problem, we must break the deadlock, rpcsec_gss_exit_net() should move out of the put() path and find another chance to get called, I think nsfs_evict() is a good place to go, when netns inode gets evicted we call rpcsec_gss_exit_net() to free the rpc client, this requires a new callback i.e. evict to be added in struct proc_ns_operations, and add netns_evict() as one of netns_operations as well.
> >
> > Wenbin Zeng (3):
> > nsfs: add evict callback into struct proc_ns_operations
> > netns: add netns_evict into netns_operations
> > auth_gss: fix deadlock that blocks rpcsec_gss_exit_net when
> > use-gss-proxy==1
> >
> > fs/nsfs.c | 2 ++
> > include/linux/proc_ns.h | 1 +
> > include/net/net_namespace.h | 1 +
> > net/core/net_namespace.c | 12 ++++++++++++
> > net/sunrpc/auth_gss/auth_gss.c | 9 ++++++---
> > 5 files changed, 22 insertions(+), 3 deletions(-)
> >
> > --
> > 1.8.3.1