Re: [PATCH NET-NEXT] ipv6: skb_expand_head() adjust skb->truesize incorrectly

From: Vasily Averin
Date: Sat Aug 28 2021 - 04:01:32 EST


On 8/27/21 7:47 PM, Eric Dumazet wrote:
>
>
> On 8/27/21 8:23 AM, Vasily Averin wrote:
>
>> I asked Alexey Kuznetsov to look at this problem. Below is his answer:
>> "I think the current scheme is obsolete. It was created
>> when we had only two kinds of skb accounting (rmem & wmem)
>> and with more kinds of accounting it just does not work.
>> Even there we had ignored problems with adjusting accounting.
>>
>> Logically the best solution would be replacing ->destructor,
>> set_owner* etc with skb_ops. Something like:
>>
>> struct skb_ops
>> {
>> void init(struct sk_buff * skb, struct skb_ops * ops, struct
>> sock * owner);
>> void fini(struct sk_buff * skb);
>> void update(struct sk_buff * skb, int adjust);
>> void inherit(struct sk_buff * skb2, struct sk_buff * skb);
>> };
>>
>> init - is replacement for skb_set_owner_r|w
>> fini - is replacement for skb_orphan
>> update - is new operation to be used in places where skb->truesize changes,
>> instead of awful constructions like:
>>
>> if (!skb->sk || skb->destructor == sock_edemux)
>> skb->truesize += size - osize;
>>
>> Now it will look like:
>>
>> if (skb->ops)
>> skb->ops->update(skb, size - osize);
>>
>> inherit - is replacement for also awful constructs like:
>>
>> if (skb->sk)
>> skb_set_owner_w(skb2, skb->sk);
>>
>> Now it will be:
>>
>> if (skb->ops)
>> skb->ops->inherit(skb2, skb);
>>
>> The implementation looks mostly obvious.
>> Some troubles can be only with new functionality:
>> update of accounting was never done before.
>>
>>
>> More efficient, functionally equivalent, but uglier and less flexible
>> alternative would be removal of ->destructor, replaced with
>> a small numeric indicator of ownership:
>>
>> enum
>> {
>> SKB_OWNER_NONE, /* aka destructor == NULL */
>> SKB_OWNER_WMEM, /* aka destructor == sk_wfree */
>> SKB_OWNER_RMEM, /* aka destructor == sk_rfree */
>> SKB_OWNER_SK, /* aka destructor == sk_edemux */
>> SKB_OWNER_TCP, /* aka destructor == tcp_wfree */
>> }
>>
>> And the same init,fini,inherit,update become functions
>> w/o any inidirect calls. Not sure it is really more efficient though."
>>
>
> Well, this does not look as stable material, and would add a bunch
> of indirect calls which are quite expensive these days (CONFIG_RETPOLINE=y)
>
> I suggest we work on a fix, using existing infra, then eventually later
> try to refactor if this is really bringing improvements.
>
> A fix could simply be a revert of 0c9f227bee119 ("ipv6: use skb_expand_head in ip6_xmit")
> since only IPv6 has the problem (because of arbitrary headers size)

I think it is not enough.

Root of the problem is that skb_expand_head() works incorrectly with non-shared skb.
In this case it do not call skb_clone before pskb_expand_head() execution,
and as result pskb_expand_head() and does not adjust skb->truesize.

I think non-shared skb is more frequent case,
so all skb_expand_head() are affected.

Therefore we need to revert all my patch set in net-next:
f1260ff skbuff: introduce skb_expand_head()
e415ed3 ipv6: use skb_expand_head in ip6_finish_output2
0c9f227 ipv6: use skb_expand_head in ip6_xmit
5678a59 ipv4: use skb_expand_head in ip_finish_output2
14ee70c vrf: use skb_expand_head in vrf_finish_output
53744a4 ax25: use skb_expand_head
a1e975e bpf: use skb_expand_head in bpf_out_neigh_v4/6
07e1d6b Merge branch 'skb_expand_head'
with fixup
06669e6 vrf: fix NULL dereference in vrf_finish_output()

And then rework ip6_finish_output2() in upstream,
to call skb_realloc_headroom() like it was done in first patch version:
https://lkml.org/lkml/2021/7/7/469.

Thank you,
Vasily Averin