Re: [PATCH NET-NEXT] ipv6: skb_expand_head() adjust skb->truesize incorrectly
From: Eric Dumazet
Date: Fri Aug 27 2021 - 12:47:37 EST
On 8/27/21 8:23 AM, Vasily Averin wrote:
> I asked Alexey Kuznetsov to look at this problem. Below is his answer:
> "I think the current scheme is obsolete. It was created
> when we had only two kinds of skb accounting (rmem & wmem)
> and with more kinds of accounting it just does not work.
> Even there we had ignored problems with adjusting accounting.
>
> Logically the best solution would be replacing ->destructor,
> set_owner* etc with skb_ops. Something like:
>
> struct skb_ops
> {
> void init(struct sk_buff * skb, struct skb_ops * ops, struct
> sock * owner);
> void fini(struct sk_buff * skb);
> void update(struct sk_buff * skb, int adjust);
> void inherit(struct sk_buff * skb2, struct sk_buff * skb);
> };
>
> init - is replacement for skb_set_owner_r|w
> fini - is replacement for skb_orphan
> update - is new operation to be used in places where skb->truesize changes,
> instead of awful constructions like:
>
> if (!skb->sk || skb->destructor == sock_edemux)
> skb->truesize += size - osize;
>
> Now it will look like:
>
> if (skb->ops)
> skb->ops->update(skb, size - osize);
>
> inherit - is replacement for also awful constructs like:
>
> if (skb->sk)
> skb_set_owner_w(skb2, skb->sk);
>
> Now it will be:
>
> if (skb->ops)
> skb->ops->inherit(skb2, skb);
>
> The implementation looks mostly obvious.
> Some troubles can be only with new functionality:
> update of accounting was never done before.
>
>
> More efficient, functionally equivalent, but uglier and less flexible
> alternative would be removal of ->destructor, replaced with
> a small numeric indicator of ownership:
>
> enum
> {
> SKB_OWNER_NONE, /* aka destructor == NULL */
> SKB_OWNER_WMEM, /* aka destructor == sk_wfree */
> SKB_OWNER_RMEM, /* aka destructor == sk_rfree */
> SKB_OWNER_SK, /* aka destructor == sk_edemux */
> SKB_OWNER_TCP, /* aka destructor == tcp_wfree */
> }
>
> And the same init,fini,inherit,update become functions
> w/o any inidirect calls. Not sure it is really more efficient though."
>
Well, this does not look as stable material, and would add a bunch
of indirect calls which are quite expensive these days (CONFIG_RETPOLINE=y)
I suggest we work on a fix, using existing infra, then eventually later
try to refactor if this is really bringing improvements.
A fix could simply be a revert of 0c9f227bee119 ("ipv6: use skb_expand_head in ip6_xmit")
since only IPv6 has the problem (because of arbitrary headers size)