Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks

From: Christian Brauner
Date: Fri Apr 20 2018 - 12:16:53 EST


On Fri, Apr 20, 2018 at 03:56:28PM +0200, Christian Brauner wrote:
> On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote:
> > On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote:
> > > Christian Brauner <christian.brauner@xxxxxxxxxx> writes:
> > >
> > > > Now that it's possible to have a different set of uevents in different
> > > > network namespaces, per-network namespace uevent sequence numbers are
> > > > introduced. This increases performance as locking is now restricted to the
> > > > network namespace affected by the uevent rather than locking
> > > > everything.
> > >
> > > Numbers please. I personally expect that the netlink mc_list issues
> > > will swamp any benefit you get from this.
> >
> > I wouldn't see how this would be the case. The gist of this is:
> > Everytime you send a uevent into a network namespace *not* owned by
> > init_user_ns you currently *have* to take mutex_lock(uevent_sock_list)
> > effectively blocking the host from processing uevents even though
> > - the uevent you're receiving might be totally different from the
> > uevent that you're sending
> > - the uevent socket of the non-init_user_ns owned network namespace
> > isn't even recorded in the list.
> >
> > The other argument is that we now have properly isolated network
> > namespaces wrt to uevents such that each netns can have its own set of
> > uevents. This can either happen by a sufficiently privileged userspace
> > process sending it uevents that are only dedicated to that specific
> > netns. Or - and this *has been true for a long time* - because network
> > devices are *properly namespaced*. Meaning a uevent for that network
> > device is *tied to a network namespace*. For both cases the uevent
> > sequence numbering will be absolutely misleading. For example, whenever
> > you create e.g. a new veth device in a new network namespace it
> > shouldn't be accounted against the initial network namespace but *only*
> > against the network namespace that has that device added to it.
>
> Eric, I did the testing. Here's what I did:
>
> I compiled two 4.17-rc1 Kernels:
> - one with per netns uevent seqnums with decoupled locking
> - one without per netns uevent seqnums with decoupled locking
>
> # Testcase 1:
> Only Injecting Uevents into network namespaces not owned by the initial user
> namespace.
> - created 1000 new user namespace + network namespace pairs
> - opened a uevent listener in each of those namespace pairs
> - injected uevents into each of those network namespaces 10,000 times meaning
> 10,000,000 (10 million) uevents were injected. (The high number of
> uevent injections should get rid of a lot of jitter.)
> - Calculated the mean transaction time.
> - *without* uevent sequence number namespacing:
> 67 Îs
> - *with* uevent sequence number namespacing:
> 55 Îs
> - makes a difference of 12 Îs
>
> # Testcase 2:
> Injecting Uevents into network namespaces not owned by the initial user
> namespace and network namespaces owned by the initial user namespace.
> - created 500 new user namespace + network namespace pairs
> - created 500 new network namespace pairs
> - opened a uevent listener in each of those namespace pairs
> - injected uevents into each of those network namespaces 10,000 times meaning
> 10,000,000 (10 million) uevents were injected. (The high number of
> uevent injections should get rid of a lot of jitter.)
> - Calculated the mean transaction time.
> - *without* uevent sequence number namespacing:
> 572 Îs
> - *with* uevent sequence number namespacing:
> 514 Îs
> - makes a difference of 58 Îs
>
> So there's performance gain. The third case would be to create a bunch
> of hanging processes that send SIGSTOP to themselves but do not actually
> open a uevent socket in their respective namespaces and then inject
> uevents into them. I expect there to be an even more performance
> benefits since the rtnl_table_lock() isn't hit in this case because
> there are no listeners.

I did the third test-case as well so:
- created 500 new user namespace + network namespace pairs *without
uevent listeners*
- created 500 new network namespace pairs *without uevent listeners*
- injected uevents into each of those network namespaces 10,000 times meaning
10,000,000 (10 million) uevents were injected. (The high number of
uevent injections should get rid of a lot of jitter.)
- Calculated the mean transaction time.
- *without* uevent sequence number namespacing:
206 Îs
- *with* uevent sequence number namespacing:
163 Îs
- makes a difference of 43 Îs

So this test-case shows performance improvement as well.

Thanks!
Christian