Re: [PATCH net 1/1 v2] rtnetlink: require unique netns identifier
From: Kirill Tkhai
Date: Mon Feb 05 2018 - 16:48:14 EST
On 05.02.2018 18:55, Christian Brauner wrote:
> Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK
> it is possible for userspace to send us requests with three different
> properties to identify a target network namespace. This affects at least
> RTM_{NEW,SET}LINK. Each of them could potentially refer to a different
> network namespace which is confusing. For legacy reasons the kernel will
> pick the IFLA_NET_NS_PID property first and then look for the
> IFLA_NET_NS_FD property but there is no reason to extend this type of
> behavior to network namespace ids. The regression potential is quite
> minimal since the rtnetlink requests in question either won't allow
> IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't
> support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place.
>> Signed-off-by: Christian Brauner <christian.brauner@xxxxxxxxxx>
> ---
> ChangeLog v1->v2:
> * return errno when the specified network namespace id is invalid
> * fill in struct netlink_ext_ack if the network namespace id is invalid
> * rename rtnl_ensure_unique_netns_attr() to rtnl_ensure_unique_netns() to
> indicate that a request without any network namespace identifying attributes
> is also considered valid.
>
> ChangeLog v0->v1:
> * report a descriptive error to userspace via struct netlink_ext_ack
> * do not fail when multiple properties specifiy the same network namespace
> ---
> net/core/rtnetlink.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 69 insertions(+)
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 56af8e41abfc..c096c4ff9a00 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1951,6 +1951,59 @@ static struct net *rtnl_link_get_net_capable(const struct sk_buff *skb,
> return net;
> }
>
> +/* Verify that rtnetlink requests supporting network namespace ids
> + * do not pass additional properties referring to different network
> + * namespaces.
> + */
> +static int rtnl_ensure_unique_netns(const struct sock *sk, struct nlattr *tb[],
> + struct netlink_ext_ack *extack)
> +{
> + int ret = -EINVAL;
> + struct net *net = NULL, *unique_net = NULL;
> +
> + /* Requests without network namespace ids have been able to specify
> + * multiple properties referring to different network namespaces so
> + * don't regress them.
> + */
> + if (!tb[IFLA_IF_NETNSID])
> + return 0;
> +
> + /* Caller operates on the current network namespace. */
> + if (!tb[IFLA_NET_NS_PID] && !tb[IFLA_NET_NS_FD])
> + return 0;
> +
> + unique_net = get_net_ns_by_id(sock_net(sk), nla_get_s32(tb[IFLA_IF_NETNSID]));
> + if (!unique_net) {
> + NL_SET_ERR_MSG(extack, "invalid network namespace id");
> + return ret;
> + }
> +
> + if (tb[IFLA_NET_NS_PID]) {
> + net = get_net_ns_by_pid(nla_get_u32(tb[IFLA_NET_NS_PID]));
> + if (net != unique_net)
> + goto on_error;
> + }
> +
> + if (tb[IFLA_NET_NS_FD]) {
> + net = get_net_ns_by_fd(nla_get_u32(tb[IFLA_NET_NS_FD]));
> + if (net != unique_net)
> + goto on_error;
> + }
> +
> + ret = 0;
> +
> +on_error:
> + put_net(unique_net);
> +
> + if (net && !IS_ERR(net))
> + put_net(net);
1)When we have tb[IFLA_NET_NS_PID and tb[IFLA_NET_NS_FD] both set and pointing
to the same net, this function increments net::count in get_net_ns_by_pid() and
in get_net_ns_by_fd(), i.e. twice. But only single put_net(net) will be called.
So, after this function net::count will be incremented by 1, and it never will
die.
2)The whole approach does not seem good for me. The first reason is it's racy.
Even if rtnl_ensure_unique_netns() returns 0, this does not guarantees that
tb[IFLA_IF_NETNSID] and tb[IFLA_NET_NS_PID] will be point the same net later,
as the pid may die or do setns(). Racy check is worse than no check at all.
The second reason is after this patch get_net_ns_by_id/get_net_ns_by_pid()/
get_net_ns_by_fd() will be called twice: the first time is in your check
and the second time is where they are actually used. This is not good for
performance.
What is the problem people pass several different tb[xxx] in one call? We
may just describe the order of tb[xxx] in man page and their priorities,
and ignore the rest after the first not zero tb[xxx] is found, and do that
in the place, where net from tb[xxx] in actually used. This is the thing
we already do.
Comparing to classic Linux interface such as syscalls, it's usual behavior
for them to ignore one argument, when another is set. Nobody confuses.
Kirill