Re: [PATCH] net: correct zerocopy refcnt with newly allocated UDP or RAW uarg
From: Willem de Bruijn
Date: Fri Aug 14 2020 - 11:45:41 EST
On Fri, Aug 14, 2020 at 10:17 AM linmiaohe <linmiaohe@xxxxxxxxxx> wrote:
>
> Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote:
> >On Thu, Aug 13, 2020 at 1:59 PM Miaohe Lin <linmiaohe@xxxxxxxxxx> wrote:
> >>
> >> The var extra_uref is introduced to pass the initial reference taken
> >> in sock_zerocopy_alloc to the first generated skb. But now we may fail
> >> to pass the initial reference with newly allocated UDP or RAW uarg
> >> when the skb is zcopied.
> >
> >extra_uref is true if there is no previous skb to append to or there is a previous skb, but that does not have zerocopy data associated yet (because the previous call(s) did not set MSG_ZEROCOPY).
> >
> >In other words, when first (allocating and) associating a zerocopy struct with the skb.
>
> Many thanks for your explaination. The var extra_uref plays the role as you say. I just borrowed the description of var extra_uref from previous commit log here.
>
> >
> >> - extra_uref = !skb_zcopy(skb); /* only ref on new uarg */
> >> + /* Only ref on newly allocated uarg. */
> >> + if (!skb_zcopy(skb) || (sk->sk_type != SOCK_STREAM && skb_zcopy(skb) != uarg))
> >> + extra_uref = true;
> >
> >SOCK_STREAM does not use __ip_append_data.
> >
> >This leaves as new branch skb_zcopy(skb) && skb_zcopy(skb) != uarg.
> >
> >This function can only acquire a uarg through sock_zerocopy_realloc, which on skb_zcopy(skb) only returns the existing uarg or NULL (for not SOCK_STREAM).
> >
> >So I don't see when that condition can happen.
> >
>
> On skb_zcopy(skb), we returns the existing uarg iff (uarg->id + uarg->len == atomic_read(&sk->sk_zckey)) in sock_zerocopy_realloc. So we may get a newly allocated
> uarg via sock_zerocopy_alloc(). Though we may not trigger this codepath now, it's still a potential problem that we may missed the right trace to uarg.
I don't think that can happen.
The question is when this branch is false
next = (u32)atomic_read(&sk->sk_zckey);
if ((u32)(uarg->id + uarg->len) == next) {
I cannot come up with a case. I think it might be vestigial. The goal
is to ensure to append only a consecutive range of notification IDs.
Each notification ID corresponds to a sendmsg invocation with
MSG_ZEROCOPY. In both TCP and UDP with corking, data is ordered and
access to changes to these fields happen together as a transaction:
/* realloc only when socket is locked (TCP, UDP cork),
* so uarg->len and sk_zckey access is serialized
*/