Re: [RFC v3 net-next 1/7] net: add rx_sk to trace_kfree_skb

From: Yan Zhai
Date: Thu Jun 06 2024 - 11:38:19 EST


On Wed, Jun 5, 2024 at 6:57 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Tue, 4 Jun 2024 14:47:38 -0700
> Yan Zhai <yan@xxxxxxxxxxxxxx> wrote:
>
> > skb does not include enough information to find out receiving
> > sockets/services and netns/containers on packet drops. In theory
> > skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
> > stack for OOO packet lookup. Similarly, skb->sk often identifies a local
> > sender, and tells nothing about a receiver.
> >
> > Allow passing an extra receiving socket to the tracepoint to improve
> > the visibility on receiving drops.
> >
> > Signed-off-by: Yan Zhai <yan@xxxxxxxxxxxxxx>
> > ---
> > v2->v3: fixed drop_monitor function prototype
> > ---
> > include/trace/events/skb.h | 11 +++++++----
> > net/core/dev.c | 2 +-
> > net/core/drop_monitor.c | 9 ++++++---
> > net/core/skbuff.c | 2 +-
> > 4 files changed, 15 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> > index 07e0715628ec..aa6b46b6172c 100644
> > --- a/include/trace/events/skb.h
> > +++ b/include/trace/events/skb.h
> > @@ -24,15 +24,16 @@ DEFINE_DROP_REASON(FN, FN)
> > TRACE_EVENT(kfree_skb,
> >
> > TP_PROTO(struct sk_buff *skb, void *location,
> > - enum skb_drop_reason reason),
> > + enum skb_drop_reason reason, struct sock *rx_sk),
> >
> > - TP_ARGS(skb, location, reason),
> > + TP_ARGS(skb, location, reason, rx_sk),
> >
> > TP_STRUCT__entry(
> > __field(void *, skbaddr)
> > __field(void *, location)
> > __field(unsigned short, protocol)
> > __field(enum skb_drop_reason, reason)
> > + __field(void *, rx_skaddr)
>
> Please add the pointer after the other pointers:
>
> __field(void *, skbaddr)
> __field(void *, location)
> + __field(void *, rx_skaddr)
> __field(unsigned short, protocol)
> __field(enum skb_drop_reason, reason)
>
> otherwise you are adding holes in the ring buffer event.
>
> The TP_STRUCT__entry() is a structure that is saved in the ring buffer. We
> want to avoid alignment holes. I also question having a short before the
> enum, if the emum is 4 bytes. The short should be at the end.
>
> In fact, looking at the format file, there is a 2 byte hole:
>
> # cat /sys/kernel/tracing/events/skb/kfree_skb/format
>
> name: kfree_skb
> ID: 1799
> format:
> field:unsigned short common_type; offset:0; size:2; signed:0;
> field:unsigned char common_flags; offset:2; size:1; signed:0;
> field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
> field:int common_pid; offset:4; size:4; signed:1;
>
> field:void * skbaddr; offset:8; size:8; signed:0;
> field:void * location; offset:16; size:8; signed:0;
> field:unsigned short protocol; offset:24; size:2; signed:0;
> field:enum skb_drop_reason reason; offset:28; size:4; signed:0;
>
> Notice that "protocol" is 2 bytes in size at offset 24, but "reason" starts
> at offset 28. This means at offset 26, there's a 2 byte hole.
>
The reason I added the pointer as the last argument is trying to
minimize the surprise to existing TP users, because for common ABIs
it's fine to omit later arguments when defining a function, but it
needs change and recompilation if the order of arguments changed.

Looking at the actual format after the change, it does not add a new
hole since protocol and reason are already packed into the same 8-byte
block, so rx_skaddr starts at 8-byte aligned offset:

# cat /sys/kernel/debug/tracing/events/skb/kfree_skb/format
name: kfree_skb
ID: 2260
format:
field:unsigned short common_type; offset:0;
size:2; signed:0;
field:unsigned char common_flags; offset:2;
size:1; signed:0;
field:unsigned char common_preempt_count; offset:3;
size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:void * skbaddr; offset:8; size:8; signed:0;
field:void * location; offset:16; size:8; signed:0;
field:unsigned short protocol; offset:24; size:2; signed:0;
field:enum skb_drop_reason reason; offset:28;
size:4; signed:0;
field:void * rx_skaddr; offset:32; size:8; signed:0;

Do you think we still need to change the order?

Yan


> -- Steve
>
>
>
> > ),
> >
> > TP_fast_assign(
> > @@ -40,12 +41,14 @@ TRACE_EVENT(kfree_skb,
> > __entry->location = location;
> > __entry->protocol = ntohs(skb->protocol);
> > __entry->reason = reason;
> > + __entry->rx_skaddr = rx_sk;
> > ),
> >
>