Re: [PATCH net-next] net-sysfs: display two backlog queue len separately

From: Eric Dumazet
Date: Mon Mar 13 2023 - 11:59:28 EST


On Mon, Mar 13, 2023 at 6:16 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote:
>
> On Mon, Mar 13, 2023 at 8:34 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >
> > On Sat, Mar 11, 2023 at 7:18 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote:
> > >
> > > From: Jason Xing <kernelxing@xxxxxxxxxxx>
> > >
> > > Sometimes we need to know which one of backlog queue can be exactly
> > > long enough to cause some latency when debugging this part is needed.
> > > Thus, we can then separate the display of both.
> > >
> > > Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx>
> > > ---
> > > net/core/net-procfs.c | 17 ++++++++++++-----
> > > 1 file changed, 12 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> > > index 1ec23bf8b05c..97a304e1957a 100644
> > > --- a/net/core/net-procfs.c
> > > +++ b/net/core/net-procfs.c
> > > @@ -115,10 +115,14 @@ static int dev_seq_show(struct seq_file *seq, void *v)
> > > return 0;
> > > }
> > >
> > > -static u32 softnet_backlog_len(struct softnet_data *sd)
> > > +static u32 softnet_input_pkt_queue_len(struct softnet_data *sd)
> > > {
> > > - return skb_queue_len_lockless(&sd->input_pkt_queue) +
> > > - skb_queue_len_lockless(&sd->process_queue);
> > > + return skb_queue_len_lockless(&sd->input_pkt_queue);
> > > +}
> > > +
> > > +static u32 softnet_process_queue_len(struct softnet_data *sd)
> > > +{
> > > + return skb_queue_len_lockless(&sd->process_queue);
> > > }
> > >
> > > static struct softnet_data *softnet_get_online(loff_t *pos)
> > > @@ -169,12 +173,15 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> > > * mapping the data a specific CPU
> > > */
> > > seq_printf(seq,
> > > - "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x\n",
> > > + "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> > > + "%08x %08x\n",
> > > sd->processed, sd->dropped, sd->time_squeeze, 0,
> > > 0, 0, 0, 0, /* was fastroute */
> > > 0, /* was cpu_collision */
> > > sd->received_rps, flow_limit_count,
> > > - softnet_backlog_len(sd), (int)seq->index);
> > > + 0, /* was len of two backlog queues */
> >
> > You can not pretend the sum is zero, some user space tools out there
> > would be fooled.
> >
> > > + (int)seq->index,
> > > + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> > > return 0;
> > > }
> > >
> > > --
> > > 2.37.3
> > >
> >
> > In general I would prefer we no longer change this file.
>
> Fine. Since now, let this legacy file be one part of history.
>
> >
> > Perhaps add a tracepoint instead ?
>
> Thanks, Eric. It's one good idea. It seems acceptable if we only need
> to trace two separate backlog queues where it can probably hit the
> limit, say, in the enqueue_to_backlog().


Note that enqueue_to_backlog() already uses a specific kfree_skb_reason() reason
(SKB_DROP_REASON_CPU_BACKLOG) so existing infrastructure should work just fine.


>
> Similarly I decide to write another two tracepoints of time_squeeze
> and budget_squeeze which I introduced to distinguish from time_squeeze
> as the below link shows:
> https://lore.kernel.org/lkml/CAL+tcoAwodpnE2NjMLPhBbmHUvmKMgSykqx0EQ4YZaQHjrx0Hw@xxxxxxxxxxxxxx/.
> For that change, any suggestions are deeply welcome :)
>

For your workloads to hit these limits enough for you to be worried,
it looks like you are not using any scaling stuff documented
in Documentation/networking/scaling.rst