Re: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior

From: Jason Xing
Date: Tue Mar 14 2023 - 05:22:42 EST


On Tue, Mar 14, 2023 at 4:41 PM Jesper Dangaard Brouer
<jbrouer@xxxxxxxxxx> wrote:
>
>
> On 14/03/2023 02.57, Jason Xing wrote:
> > On Tue, Mar 14, 2023 at 5:58 AM Kui-Feng Lee <sinquersw@xxxxxxxxx> wrote:
> >>
> >> On 3/11/23 08:36, Jason Xing wrote:
> >>> From: Jason Xing <kernelxing@xxxxxxxxxxx>
> >>>
> >>> When we encounter some performance issue and then get lost on how
> >>> to tune the budget limit and time limit in net_rx_action() function,
> >>> we can separately counting both of them to avoid the confusion.
> >>>
> >>> Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx>
> >>> ---
> >>> note: this commit is based on the link as below:
> >>> https://lore.kernel.org/lkml/20230311151756.83302-1-kerneljasonxing@xxxxxxxxx/
> >>> ---
> [...]
> >>> diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> >>> index 97a304e1957a..4d1a499d7c43 100644
> >>> --- a/net/core/net-procfs.c
> >>> +++ b/net/core/net-procfs.c
> >>> @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> >>> */
> >>> seq_printf(seq,
> >>> "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> >>> - "%08x %08x\n",
> >>> - sd->processed, sd->dropped, sd->time_squeeze, 0,
> >>> + "%08x %08x %08x %08x\n",
> >>> + sd->processed, sd->dropped,
> >>> + 0, /* was old way to count time squeeze */
> >>
> >> Should we show a proximate number? For example,
> >> sd->time_squeeze + sd->bud_squeeze.
> >
> > Yeah, It does make sense. Let the old way to display untouched.
> >
>
[...]
> Yes, I don't think we can/should remove this squeeze stat because
> several tools e.g. my own[1] captures these stats (and I know Willem
> also have his own tool).
> I like the sd->time_squeeze + sd->budget_squeeze suggestion.

So do I. Therefore I followed this suggestion in the next submission.

[1]
https://lore.kernel.org/lkml/20230314030532.9238-3-kerneljasonxing@xxxxxxxxx/

>
> [1]
> https://github.com/netoptimizer/network-testing/blob/master/bin/softnet_stat.pl
>
>
> >>
> >>
> >>> + 0,
> >>> 0, 0, 0, 0, /* was fastroute */
> >>> 0, /* was cpu_collision */
> >>> sd->received_rps, flow_limit_count,
> >>> 0, /* was len of two backlog queues */
> >>> (int)seq->index,
> >>> - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> >>> + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> >>> + sd->time_squeeze, sd->budget_squeeze);
> >>> return 0;
> >>> }
> >>>
>
[...]
> We recently had a very long troubleshooting session around a latency
> issue (Cc Simon) where we used the tool[1]. The issue was NIC hardware
> RX queue was backlogged, but we didn't see any squeeze events, which
> confused us. (This happens because budget was 300 and two NICs using 64
> budget each doesn't exceed 300).

I recently found some users running on our production environment hit
the time_squeeze very often which aroused my interests.
Env:
1) budget is 300;
2) eth0 is virtio_net which only registers 32 input interrupts (32
queue pairs) with a larger number of cpus online.

>
> We were/are missing another counter to tell us net_rx_action() "repoll"
> is happening (as code !list_empty(&repoll)). That were the case and it
> would have "told" us that hardware RX ring was full (larger than 64).
>
> We worked around this limitation by using the tracepoint for napi_poll,
> and manually deduced that 64 bulking must mean that "repoll" were happening.
>
> Oneliner bpftrace script:
>
> bpftrace -e 'tracepoint:napi:napi_poll {
> @napi_rx_bulk[str(args->dev_name)] = lhist(args->work, 0, 64, 4); }'
>
> We used this script (that also measures softirq latency):
>
>
> https://github.com/xdp-project/xdp-project/blob/master/areas/latency/napi_monitor.bt
>
>
[...]
> I do wonder is it would be valuable to *also* add a tracepoint to
> net_rx_action, that expose sd->time_squeeze, sd->budget_squeeze and
> repoll-not-empty.

I believe it's useful that we can show more details in softnet_data,
but I'm confused about how to display them.
This morning I submitted one patch[1] and chose to do such things when
reading the softnet_stat file.

Could we add more data in the softnet_stat file while also tracing
those three important points? I'm not sure.

Thanks,
Jason

>
> --Jesper
>