Re: [PATCH net-next v3] net: dqs: add NIC stall detector based on BQL

From: Eric Dumazet
Date: Tue Feb 13 2024 - 08:58:14 EST


On Fri, Feb 2, 2024 at 5:55 PM Breno Leitao <leitao@xxxxxxxxxx> wrote:
>
> From: Jakub Kicinski <kuba@xxxxxxxxxx>
>
> softnet_data->time_squeeze is sometimes used as a proxy for
> host overload or indication of scheduling problems. In practice
> this statistic is very noisy and has hard to grasp units -
> e.g. is 10 squeezes a second to be expected, or high?
>
> Delaying network (NAPI) processing leads to drops on NIC queues
> but also RTT bloat, impacting pacing and CA decisions.
> Stalls are a little hard to detect on the Rx side, because
> there may simply have not been any packets received in given
> period of time. Packet timestamps help a little bit, but
> again we don't know if packets are stale because we're
> not keeping up or because someone (*cough* cgroups)
> disabled IRQs for a long time.

Please note that adding other sysfs entries is expensive for workloads
creating/deleting netdev and netns often.

I _think_ we should find a way for not creating
/sys/class/net/<interface>/queues/tx-{Q}/byte_queue_limits directory
and files
for non BQL enabled devices (like loopback !)