Re: [RFC net-next 1/6] net: add kfree_skb_for_sk function

From: Yan Zhai
Date: Fri May 31 2024 - 12:58:36 EST


Hi Eric,

Thanks for the feedback.

On Fri, May 31, 2024 at 1:51 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Thu, May 30, 2024 at 11:46 PM Yan Zhai <yan@xxxxxxxxxxxxxx> wrote:
> >
> > Implement a new kfree_skb_for_sk to replace kfree_skb_reason on a few
> > local receive path. The function accepts an extra receiving socket
> > argument, which will be set in skb->cb for kfree_skb/consume_skb
> > tracepoint consumption. With this extra bit of information, it will be
> > easier to attribute dropped packets to netns/containers and
> > sockets/services for performance and error monitoring purposes.
>
> This is a lot of code churn...
>
> I have to ask : Why not simply adding an sk parameter to an existing
> trace point ?
>
Modifying a signature of the current tracepoint seems like a breaking
change, that's why I was saving the context inside skb->cb, hoping to
not impact any existing programs watching this tracepoint. But
thinking it twice, it might not cause a problem if the signature
becomes:

trace_kfree_skb(const struct sk_buff *skb, void *location, enum
skb_drop_reason reason, const struct sock *sk)

As return values are usually not a thing for tracepoints, it is
probably still compatible. The cons is that the last "sk" still breaks
the integrity of naming. How about making a "kfree_skb_context"
internal struct and putting it as the last argument to "hide" the
naming confusion?

> If this not possible, I would rather add new tracepoints, adding new classes,
> because it will ease your debugging :
>
> When looking for TCP drops, simply use a tcp_event_sk_skb_reason instance,
> and voila, no distractions caused by RAW/ICMP/ICMPv6/af_packet drops.
>
> DECLARE_EVENT_CLASS(tcp_event_sk_skb_reason,
>
> TP_PROTO(const struct sock *sk, const struct sk_buff *skb, enum
> skb_drop_reason reason),
> ...
> );

The alternative of adding another tracepoint could indeed work, we had
a few cases like that in the past, e.g.

https://lore.kernel.org/lkml/20230711043453.64095-1-ivan@xxxxxxxxxxxxxx/
https://lore.kernel.org/netdev/20230707043923.35578-1-ivan@xxxxxxxxxxxxxx/

But it does feel like a whack-a-mole thing. The problems are solvable
if we extend the kfree_skb tracepoint, so I would prefer to not add a
new tracepoint.

>
> Also, the name ( kfree_skb_for_sk) and order of parameters is confusing.
>
> I always prefer this kind of ordering/names :
>
> void sk_skb_reason_drop( [struct net *net ] // not relevant here, but
> to expand the rationale
> struct sock *sk, struct sk_buff *skb, enum skb_drop_reason reason)
>
> Looking at the name, we immediately see the parameter order.
>
> The consume one (no @reason there) would be called
>
> void sk_skb_consume(struct sock *sk, struct sk_buff *skb);

I was intending to keep the "kfree_skb" prefix initially since it
would appear less surprising to kernel developers who used kfree_skb
and kfree_skb_reason. But your points do make good sense. How about
"kfree_sk_skb_reason" and "consume_sk_skb" here?

thanks
Yan