Re: [RFC net-next 1/6] net: add kfree_skb_for_sk function
From: Eric Dumazet
Date: Fri May 31 2024 - 13:33:08 EST
On Fri, May 31, 2024 at 6:58 PM Yan Zhai <yan@xxxxxxxxxxxxxx> wrote:
>
> Hi Eric,
>
> Thanks for the feedback.
>
> On Fri, May 31, 2024 at 1:51 AM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >
> > On Thu, May 30, 2024 at 11:46 PM Yan Zhai <yan@xxxxxxxxxxxxxx> wrote:
> > >
> > > Implement a new kfree_skb_for_sk to replace kfree_skb_reason on a few
> > > local receive path. The function accepts an extra receiving socket
> > > argument, which will be set in skb->cb for kfree_skb/consume_skb
> > > tracepoint consumption. With this extra bit of information, it will be
> > > easier to attribute dropped packets to netns/containers and
> > > sockets/services for performance and error monitoring purposes.
> >
> > This is a lot of code churn...
> >
> > I have to ask : Why not simply adding an sk parameter to an existing
> > trace point ?
> >
> Modifying a signature of the current tracepoint seems like a breaking
> change, that's why I was saving the context inside skb->cb, hoping to
> not impact any existing programs watching this tracepoint. But
> thinking it twice, it might not cause a problem if the signature
> becomes:
>
> trace_kfree_skb(const struct sk_buff *skb, void *location, enum
> skb_drop_reason reason, const struct sock *sk)
>
> As return values are usually not a thing for tracepoints, it is
> probably still compatible. The cons is that the last "sk" still breaks
> the integrity of naming. How about making a "kfree_skb_context"
> internal struct and putting it as the last argument to "hide" the
> naming confusion?
>
> > If this not possible, I would rather add new tracepoints, adding new classes,
> > because it will ease your debugging :
> >
> > When looking for TCP drops, simply use a tcp_event_sk_skb_reason instance,
> > and voila, no distractions caused by RAW/ICMP/ICMPv6/af_packet drops.
> >
> > DECLARE_EVENT_CLASS(tcp_event_sk_skb_reason,
> >
> > TP_PROTO(const struct sock *sk, const struct sk_buff *skb, enum
> > skb_drop_reason reason),
> > ...
> > );
>
> The alternative of adding another tracepoint could indeed work, we had
> a few cases like that in the past, e.g.
>
> https://lore.kernel.org/lkml/20230711043453.64095-1-ivan@xxxxxxxxxxxxxx/
> https://lore.kernel.org/netdev/20230707043923.35578-1-ivan@xxxxxxxxxxxxxx/
>
> But it does feel like a whack-a-mole thing. The problems are solvable
> if we extend the kfree_skb tracepoint, so I would prefer to not add a
> new tracepoint.
Solvable with many future merge conflicts for stable teams.
>
> >
> > Also, the name ( kfree_skb_for_sk) and order of parameters is confusing.
> >
> > I always prefer this kind of ordering/names :
> >
> > void sk_skb_reason_drop( [struct net *net ] // not relevant here, but
> > to expand the rationale
> > struct sock *sk, struct sk_buff *skb, enum skb_drop_reason reason)
> >
> > Looking at the name, we immediately see the parameter order.
> >
> > The consume one (no @reason there) would be called
> >
> > void sk_skb_consume(struct sock *sk, struct sk_buff *skb);
>
> I was intending to keep the "kfree_skb" prefix initially since it
> would appear less surprising to kernel developers who used kfree_skb
> and kfree_skb_reason. But your points do make good sense. How about
> "kfree_sk_skb_reason" and "consume_sk_skb" here?
>
IMO kfree_skb() and consume_skb() were a wrong choice. We have to live
with them.
It should have been skb_free(), skb_consume(), skb_alloc(),
to be consistent.
Following (partial) list was much better:
skb_add_rx_frag_netmem, skb_coalesce_rx_frag, skb_pp_cow_data,
skb_cow_data_for_xdp,
skb_dump, skb_tx_error, skb_morph, skb_zerocopy_iter_stream, skb_copy_ubufs,
skb_clone, skb_headers_offset_update, skb_copy_header, skb_copy,
skb_realloc_headroom, skb_expand_head, skb_copy_expand, skb_put,
skb_push, skb_pull, skb_pull_data, skb_trim, skb_copy_bits,
skb_splice_bits, skb_send_sock_locked, skb_store_bits,
skb_checksum, skb_copy_and_csum_bits, skb_zerocopy_headlen,
skb_zerocopy, skb_copy_and_csum_dev, skb_dequeue,
skb_dequeue_tail, skb_queue_purge_reason, skb_errqueue_purge,
skb_queue_head, skb_queue_tail, skb_unlink, skb_append,
skb_split, skb_prepare_seq_read, skb_seq_read, skb_abort_seq_read,
skb_find_text, skb_append_pagefrags, skb_pull_rcsum, skb_segment_list,
skb_segment, skb_to_sgvec, skb_to_sgvec_nomark, skb_cow_data, skb_clone_sk,
skb_complete_tx_timestamp, skb_tstamp_tx, skb_complete_wifi_ack,
skb_partial_csum_set, skb_checksum_setup, skb_checksum_trimmed,
skb_try_coalesce, skb_scrub_packet, skb_vlan_untag, skb_ensure_writable,
skb_ensure_writable_head_tail, skb_vlan_pop, skb_vlan_push, skb_eth_pop,
skb_eth_push, skb_mpls_push, skb_mpls_pop, skb_mpls_update_lse,
skb_mpls_dec_ttl, skb_condense, skb_ext_add, skb_splice_from_iter
(just to make my point very very clear)
Instead we have a myriad of functions with illogical parameter
ordering vs their names.
I see no reason to add more confusion for new helpers.