Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

From: Jason A. Donenfeld
Date: Fri Nov 10 2017 - 21:26:27 EST


IIRC, 4.14 comes tomorrow-ish? If possible, it would be nice to get
this in 4.14 before then, so it doesn't have to take time to trickle
down through stable.

Jason

On Thu, Nov 9, 2017 at 1:04 PM, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote:
> The way people generally use netlink_dump is that they fill in the skb
> as much as possible, breaking when nla_put returns an error. Then, they
> get called again and start filling out the next skb, and again, and so
> forth. The mechanism at work here is the ability for the iterative
> dumping function to detect when the skb is filled up and not fill it
> past the brim, waiting for a fresh skb for the rest of the data.
>
> However, if the attributes are small and nicely packed, it is possible
> that a dump callback function successfully fills in attributes until the
> skb is of size 4080 (libmnl's default page-sized receive buffer size).
> The dump function completes, satisfied, and then, if it happens to be
> that this is actually the last skb, and no further ones are to be sent,
> then netlink_dump will add on the NLMSG_DONE part:
>
> nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
>
> It is very important that netlink_dump does this, of course. However, in
> this example, that call to nlmsg_put_answer will fail, because the
> previous filling by the dump function did not leave it enough room. And
> how could it possibly have done so? All of the nla_put variety of
> functions simply check to see if the skb has enough tailroom,
> independent of the context it is in.
>
> In order to keep the important assumptions of all netlink dump users, it
> is therefore important to give them an skb that has this end part of the
> tail already reserved, so that the call to nlmsg_put_answer does not
> fail. Otherwise, library authors are forced to find some bizarre sized
> receive buffer that has a large modulo relative to the common sizes of
> messages received, which is ugly and buggy.
>
> This patch thus saves the NLMSG_DONE for an additional message, for the
> case that things are dangerously close to the brim. This requires
> keeping track of the errno from ->dump() across calls.
>
> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx>
> ---
> Can we get this into 4.14? Is there still time? It should also be queued
> up for stable.
>
> Changes v3->v4:
> - I like long lines. The kernel does not. Wrapped at 80 now.
>
> net/netlink/af_netlink.c | 17 +++++++++++------
> net/netlink/af_netlink.h | 1 +
> 2 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index b93148e8e9fb..15c99dfa3d72 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
> struct sk_buff *skb = NULL;
> struct nlmsghdr *nlh;
> struct module *module;
> - int len, err = -ENOBUFS;
> + int err = -ENOBUFS;
> int alloc_min_size;
> int alloc_size;
>
> @@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk)
> skb_reserve(skb, skb_tailroom(skb) - alloc_size);
> netlink_skb_set_owner_r(skb, sk);
>
> - len = cb->dump(skb, cb);
> + if (nlk->dump_done_errno > 0)
> + nlk->dump_done_errno = cb->dump(skb, cb);
>
> - if (len > 0) {
> + if (nlk->dump_done_errno > 0 ||
> + skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
> mutex_unlock(nlk->cb_mutex);
>
> if (sk_filter(sk, skb))
> @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
> return 0;
> }
>
> - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
> - if (!nlh)
> + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
> + sizeof(nlk->dump_done_errno), NLM_F_MULTI);
> + if (WARN_ON(!nlh))
> goto errout_skb;
>
> nl_dump_check_consistent(cb, nlh);
>
> - memcpy(nlmsg_data(nlh), &len, sizeof(len));
> + memcpy(nlmsg_data(nlh), &nlk->dump_done_errno,
> + sizeof(nlk->dump_done_errno));
>
> if (sk_filter(sk, skb))
> kfree_skb(skb);
> @@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
> }
>
> nlk->cb_running = true;
> + nlk->dump_done_errno = INT_MAX;
>
> mutex_unlock(nlk->cb_mutex);
>
> diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
> index 028188597eaa..962de7b3c023 100644
> --- a/net/netlink/af_netlink.h
> +++ b/net/netlink/af_netlink.h
> @@ -34,6 +34,7 @@ struct netlink_sock {
> wait_queue_head_t wait;
> bool bound;
> bool cb_running;
> + int dump_done_errno;
> struct netlink_callback cb;
> struct mutex *cb_mutex;
> struct mutex cb_def_mutex;
> --
> 2.15.0
>