Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
From: Christian Brauner
Date: Thu Aug 30 2018 - 10:45:53 EST
On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
> On 29.08.2018 21:13, Christian Brauner wrote:
> > Hi Kirill,
> >
> > Thanks for the question!
> >
> > On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
> >> Hi, Christian,
> >>
> >> On 29.08.2018 02:18, Christian Brauner wrote:
> >>> From: Christian Brauner <christian@xxxxxxxxxx>
> >>>
> >>> Hey,
> >>>
> >>> A while back we introduced and enabled IFLA_IF_NETNSID in
> >>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> >>> to signficant performance increases since it allows userspace to avoid
> >>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> >>> interfaces from the netns associated with the netns_fd. Especially when a
> >>> lot of network namespaces are in use, using setns() becomes increasingly
> >>> problematic when performance matters.
> >>
> >> could you please give a real example, when setns()+socket(AF_NETLINK) cause
> >> problems with the performance? You should do this only once on application
> >> startup, and then you have created netlink sockets in any net namespaces you
> >> need. What is the problem here?
> >
> > So we have a daemon (LXD) that is often running thousands of containers.
> > When users issue a lxc list request against the daemon it returns a list
> > of all containers including all of the interfaces and addresses for each
> > container. To retrieve those addresses we currently rely on setns() +
> > getifaddrs() for each of those containers. That has horrible
> > performance.
>
> Could you please provide some numbers showing that setns()
> introduces signify performance decrease in the application?
Sure, might take a few days++ though since I'm traveling.
>
> I worry about all this just because of netlink interface is
> already overloaded, while this patch makes it even less modular.
Introducing the IFA_IF_NETNSID property will not make the netlink
interface less modular. It is a clean, RTM_*ADDR-request specific
property using network namespace identifiers which we discussed in prior
patches are the way to go forward.
You can already get interfaces via GETLINK from another network
namespaces than the one you reside in (Which we enabled just a few
months back.) but you can't do the same for GETADDR. Those two are
almost always used together. When you want to get the links you usually
also want to get the addresses associated with it right after.
In a prior discussion we agreed that network namespace identifiers are
the way to go forward but that any other propery, i.e. PIDs and fds
should never be ported into other parts of the codebase and that is
indeed something I agree with.
> In case of one day we finally reach rtnl unscalability trap,
> every common interface like this may be a decisive nail in
> a coffin lid of possibility to overwrite everything.
>
> > The problem with what you're proposing is that the daemon would need to
> > cache a socket file descriptor for each container which is something
> > that we unfortunately cannot do since we can't excessively cache file
> > descriptors because we can easily hit the open file limit. We also
> > refrain from caching file descriptors for a long time for security
> > reasons.
> >
> > For the case where users just request a list of the interfaces we
> > can already use RTM_GETLINK + IFLA_IF_NETNS which has way better
> > performance. But we can't do the same with RTM_GETADDR requests which
> > was an oversight on my part when I wrote the original patchset for the
> > RTM_*LINK requests. This just rectifies this and aligns RTM_GETLINK +
> > RTM_GETADDR.
> > Based on this patchset I have written a userspace POC that is basically
> > a netns namespace aware getifaddr() or - as I like to call it -
> > netns_getifaddr().
> >
> >>
> >>> Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
> >>> getifaddrs() style functions and friends). But currently, RTM_GETADDR
> >>> requests do not support a similar property like IFLA_IF_NETNSID for
> >>> RTM_*LINK requests.
> >>> This is problematic since userspace can retrieve interfaces from another
> >>> network namespace by sending a IFLA_IF_NETNSID property along but
> >>> RTM_GETLINK request but is still forced to use the legacy setns() style of
> >>> retrieving interfaces in RTM_GETADDR requests.
> >>>
> >>> The goal of this series is to make it possible to perform RTM_GETADDR
> >>> requests on different network namespaces. To this end a new IFA_IF_NETNSID
> >>> property for RTM_*ADDR requests is introduced. It can be used to send a
> >>> network namespace identifier along in RTM_*ADDR requests. The network
> >>> namespace identifier will be used to retrieve the target network namespace
> >>> in which the request is supposed to be fulfilled. This aligns the behavior
> >>> of RTM_*ADDR requests with the behavior of RTM_*LINK requests.
> >>>
> >>> Security:
> >>> - The caller must have assigned a valid network namespace identifier for
> >>> the target network namespace.
> >>> - The caller must have CAP_NET_ADMIN in the owning user namespace of the
> >>> target network namespace.
> >>>
> >>> Thanks!
> >>> Christian
> >>>
> >>> [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
> >>> [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
> >>> [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
> >>> [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
> >>> [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
> >>>
> >>> Christian Brauner (5):
> >>> rtnetlink: add rtnl_get_net_ns_capable()
> >>> if_addr: add IFA_IF_NETNSID
> >>> ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
> >>> ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
> >>> rtnetlink: move type calculation out of loop
> >>>
> >>> include/net/rtnetlink.h | 1 +
> >>> include/uapi/linux/if_addr.h | 1 +
> >>> net/core/rtnetlink.c | 15 +++++---
> >>> net/ipv4/devinet.c | 38 +++++++++++++++-----
> >>> net/ipv6/addrconf.c | 70 ++++++++++++++++++++++++++++--------
> >>> 5 files changed, 97 insertions(+), 28 deletions(-)
> >>>