Re: [PATCH net-next] net: skb: fix kfree_skb event output error in perf

From: Menglong Dong
Date: Fri Aug 26 2022 - 00:47:23 EST


On Thu, Aug 25, 2022 at 11:32 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Wed, Aug 24, 2022 at 10:37 PM <menglong8.dong@xxxxxxxxx> wrote:
> >
> > From: Menglong Dong <imagedong@xxxxxxxxxxx>
> >
> > As Eric reported, the 'reason' field is not presented when trace the
> > kfree_skb event by perf:
> >
> > $ perf record -e skb:kfree_skb -a sleep 10
> > $ perf script
> > ip_defrag 14605 [021] 221.614303: skb:kfree_skb:
> > skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1
> > reason:
> >
> > The cause seems to be passing kernel address directly to TP_printk(),
> > which is not right.
>
> Why ?
>

I think it is because of how perf passes data to the user space. From
what 'perf_output_sample()' do, we can know that perf passes the data
of entry (with other data) to the user, and the user generates the output
string from the format string (which can be obtained from
/sys/kernel/debug/tracing/event/skb/kfree_skb/format) and the entry data.

Therefore, perf can't get the string of drop reasons from the entry, only
the enum.

> It seems this adds an expensive copy of a string that should reside in
> rodata section of vmlinux, thus can not disappear...
> (Also the ring buffer entry will have a dynamic size ?)
>

It indeed will add additional cost, but it seems unavoidable. In the old
version, __print_symbolic() is used, which will loop all the drop reason
from a array and find corresponding string:

TP_printk("skbaddr=%p protocol=%u location=%p reason: %s",
__entry->skbaddr, __entry->protocol, __entry->location,
__print_symbolic(__entry->reason,
TRACE_SKB_DROP_REASON))

And I think the cost of coping strings may be less than this loop? as the
drop reasons are getting more and more.

> trace_safe_str() is using is_kernel_rodata() which should return true
> for drop_reasons[X] ?
>
> $ grep drop_reasons net/core/dropreason_str.c
> const char * const drop_reasons[] = {
> ...
>
>
>
> >
> > Therefore, fix this by adding a '__string' field to the TP_STRUCT of
> > kfree_skb, which is 'reason_str', and passing it to TP_printk().
> >
> > (Not sure if we should still keep the 'reason' field in
> > TP_STRUCT__entry)
>
> Maybe for event/trace filtering purposes ?
>
> >
> > Reported-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> > Signed-off-by: Menglong Dong <imagedong@xxxxxxxxxxx>
> > ---
> > include/trace/events/skb.h | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> > index 45264e4bb254..7235554141c3 100644
> > --- a/include/trace/events/skb.h
> > +++ b/include/trace/events/skb.h
> > @@ -24,6 +24,7 @@ TRACE_EVENT(kfree_skb,
> > __field(void *, location)
> > __field(unsigned short, protocol)
> > __field(enum skb_drop_reason, reason)
> > + __string(reason_str, drop_reasons[reason])
> > ),
> >
> > TP_fast_assign(
> > @@ -31,11 +32,12 @@ TRACE_EVENT(kfree_skb,
> > __entry->location = location;
> > __entry->protocol = ntohs(skb->protocol);
> > __entry->reason = reason;
> > + __assign_str(reason_str, drop_reasons[reason]);
> > ),
> >
> > TP_printk("skbaddr=%p protocol=%u location=%p reason: %s",
> > __entry->skbaddr, __entry->protocol, __entry->location,
> > - drop_reasons[__entry->reason])
> > + __get_str(reason_str))
> > );
> >
> > TRACE_EVENT(consume_skb,
> > --
> > 2.37.2
> >