Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR

From: Kirill Tkhai
Date: Mon Sep 03 2018 - 09:42:00 EST


On 01.09.2018 04:34, Christian Brauner wrote:
> On Thu, Aug 30, 2018 at 04:45:45PM +0200, Christian Brauner wrote:
>> On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
>>> On 29.08.2018 21:13, Christian Brauner wrote:
>>>> Hi Kirill,
>>>>
>>>> Thanks for the question!
>>>>
>>>> On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
>>>>> Hi, Christian,
>>>>>
>>>>> On 29.08.2018 02:18, Christian Brauner wrote:
>>>>>> From: Christian Brauner <christian@xxxxxxxxxx>
>>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> A while back we introduced and enabled IFLA_IF_NETNSID in
>>>>>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
>>>>>> to signficant performance increases since it allows userspace to avoid
>>>>>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
>>>>>> interfaces from the netns associated with the netns_fd. Especially when a
>>>>>> lot of network namespaces are in use, using setns() becomes increasingly
>>>>>> problematic when performance matters.
>>>>>
>>>>> could you please give a real example, when setns()+socket(AF_NETLINK) cause
>>>>> problems with the performance? You should do this only once on application
>>>>> startup, and then you have created netlink sockets in any net namespaces you
>>>>> need. What is the problem here?
>>>>
>>>> So we have a daemon (LXD) that is often running thousands of containers.
>>>> When users issue a lxc list request against the daemon it returns a list
>>>> of all containers including all of the interfaces and addresses for each
>>>> container. To retrieve those addresses we currently rely on setns() +
>>>> getifaddrs() for each of those containers. That has horrible
>>>> performance.
>>>
>>> Could you please provide some numbers showing that setns()
>>> introduces signify performance decrease in the application?
>>
>> Sure, might take a few days++ though since I'm traveling.
>
> Hey Kirill,
>
> As promised here's some code [1] that compares performance. I basically
> did a setns() to the network namespace and called getifaddrs() and
> compared this to the scenario where I use the newly introduced property.
> I did this 1 million times and calculated the mean getifaddrs()
> retrieval time based on that.
> My patch cuts the time in half.
>
> brauner@wittgenstein:~/netns_getifaddrs$ ./getifaddrs_perf 0 1178
> Mean time in microseconds (netnsid): 81
> Mean time in microseconds (setns): 162
>
> Christian
>
> I'm only appending the main file since the netsns_getifaddrs() code I
> used is pretty long:
>
> [1]:
>
> #define _GNU_SOURCE
> #define __STDC_FORMAT_MACROS
> #include <fcntl.h>
> #include <inttypes.h>
> #include <linux/types.h>
> #include <sched.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/stat.h>
> #include <sys/time.h>
> #include <sys/types.h>
> #include <unistd.h>
>
> #include "netns_getifaddrs.h"
> #include "print_getifaddrs.h"
>
> #define ITERATIONS 1000000
> #define SEC_TO_MICROSEC(x) (1000000 * (x))
>
> int main(int argc, char *argv[])
> {
> int i, ret;
> __s32 netns_id;
> pid_t netns_pid;
> char path[1024];
> intmax_t times[ITERATIONS];
> struct timeval t1, t2;
> intmax_t time_in_mcs;
> int fret = EXIT_FAILURE;
> intmax_t sum = 0;
> int host_netns_fd = -1, netns_fd = -1;
>
> struct ifaddrs *ifaddrs = NULL;
>
> if (argc != 3)
> goto on_error;
>
> netns_id = atoi(argv[1]);
> netns_pid = atoi(argv[2]);
> printf("%d\n", netns_id);
> printf("%d\n", netns_pid);
>
> for (i = 0; i < ITERATIONS; i++) {
> ret = gettimeofday(&t1, NULL);
> if (ret < 0)
> goto on_error;
>
> ret = netns_getifaddrs(&ifaddrs, netns_id);
> freeifaddrs(ifaddrs);
> if (ret < 0)
> goto on_error;
>
> ret = gettimeofday(&t2, NULL);
> if (ret < 0)
> goto on_error;
>
> time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> times[i] = time_in_mcs;
> }
>
> for (i = 0; i < ITERATIONS; i++)
> sum += times[i];
>
> printf("Mean time in microseconds (netnsid): %ju\n",
> sum / ITERATIONS);
>
> ret = snprintf(path, sizeof(path), "/proc/%d/ns/net", netns_pid);
> if (ret < 0 || (size_t)ret >= sizeof(path))
> goto on_error;
>
> netns_fd = open(path, O_RDONLY | O_CLOEXEC);
> if (netns_fd < 0)
> goto on_error;
>
> host_netns_fd = open("/proc/self/ns/net", O_RDONLY | O_CLOEXEC);
> if (host_netns_fd < 0)
> goto on_error;
>
> memset(times, 0, sizeof(times));
> for (i = 0; i < ITERATIONS; i++) {
> ret = gettimeofday(&t1, NULL);
> if (ret < 0)
> goto on_error;
>
> ret = setns(netns_fd, CLONE_NEWNET);
> if (ret < 0)
> goto on_error;
>
> ret = getifaddrs(&ifaddrs);
> freeifaddrs(ifaddrs);
> if (ret < 0)
> goto on_error;
>
> ret = gettimeofday(&t2, NULL);
> if (ret < 0)
> goto on_error;
>
> ret = setns(host_netns_fd, CLONE_NEWNET);
> if (ret < 0)
> goto on_error;
>
> time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> times[i] = time_in_mcs;
> }
>
> for (i = 0; i < ITERATIONS; i++)
> sum += times[i];
>
> printf("Mean time in microseconds (setns): %ju\n",
> sum / ITERATIONS);
>
> fret = EXIT_SUCCESS;
>
> on_error:
> if (netns_fd >= 0)
> close(netns_fd);
>
> if (host_netns_fd >= 0)
> close(host_netns_fd);
>
> exit(fret);
> }

But this is a synthetic test, while I asked about real workflow.
Is this real problem for lxd, and there is observed performance
decrease?

I see, there are already nsid use in existing code, but I have to say,
that adding new types of variables as a system call arguments make it
less modular. When you request RTM_GETADDR for a specific nsid, this
obligates the kernel to make everything unchangable during the call,
doesn't it?

We may look at existing code as example, what problems this may cause.
Look at do_setlink(). There are many different types of variables,
and all of them should be dereferenced atomically. So, all the function
is executed under global rtnl. And this causes delays in another config
places, which are sensitive to rtnl. So, adding more dimensions to RTM_GETADDR
may turn it in the same overloaded function as do_setlink() is. And one
day, when we reach the state, when we must rework all of this, we won't
be able to do this. I'm not sure, now is not too late.

I just say about this, because it's possible we should consider another
approach in rtnl communication in general, and stop to overload it.

Kirill