Re: [PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics

From: Jesper Dangaard Brouer
Date: Mon Nov 29 2021 - 09:02:04 EST




On 27/11/2021 00.01, Daniel Borkmann wrote:
On 11/26/21 11:27 PM, Daniel Borkmann wrote:
On 11/26/21 7:06 PM, Jakub Kicinski wrote:
[...]
The information required by the admin is higher level. As you say the
primary concern there is "how many packets did XDP eat".

Agree. Above said, for XDP_DROP I would see one use case where you compare
different drivers or bond vs no bond as we did in the past in [0] when
testing against a packet generator (although I don't see bond driver covered
in this series here yet where it aggregates the XDP stats from all bond slave devs).

On a higher-level wrt "how many packets did XDP eat", it would make sense
to have the stats for successful XDP_{TX,REDIRECT} given these are out
of reach from a BPF prog PoV - we can only count there how many times we
returned with XDP_TX but not whether the pkt /successfully made it/.


Exactly.

In terms of error cases, could we just standardize all drivers on the behavior
of e.g. mlx5e_xdp_handle(), meaning, a failure from XDP_{TX,REDIRECT} will
hit the trace_xdp_exception() and then fallthrough to bump a drop counter
(same as we bump in XDP_DROP then). So the drop counter will account for
program drops but also driver-related drops.


Hmm... I don't agree here. IMHO the BPF-program's *choice* to drop (via XDP_DROP) should NOT share the counter with the driver-related drops.

The driver-related drops must be accounted separate.

For the record, I think mlx5e_xdp_handle() does the wrong thing, of accounting everything as XDP_DROP in (rq->stats->xdp_drop++).

Current mlx5 driver stats are highly problematic actually.
Please don't model stats behavior after this driver.

E.g. if BPF-prog takes the *choice* XDP_TX or XDP_REDIRECT or XDP_DROP, then the packet is invisible to "ifconfig" stats. It is like the driver never received these packets (which is wrong IMHO). (The stats are only avail via ethtool -S).


At some later point the trace_xdp_exception() could be extended with an error
code that the driver would propagate (given some of them look quite similar
across drivers, fwiw), and then whoever wants to do further processing with them can do so via bpftrace or other tooling.

I do like trace_xdp_exception() is invoked in mlx5e_xdp_handle(), but do notice that xdp_do_redirect() also have a tracepoint that can be used for troubleshooting. (I usually use xdp_monitor for troubleshooting which catch both).

I like the stats XDP handling better in mvneta_run_xdp().

Just thinking out loud, one straight forward example we could start out with that is also related to Paolo's series [1] ...

enum xdp_error {
    XDP_UNKNOWN,
    XDP_ACTION_INVALID,
    XDP_ACTION_UNSUPPORTED,
};

... and then bpf_warn_invalid_xdp_action() returns one of the latter two
which we pass to trace_xdp_exception(). Later there could be XDP_DRIVER_*
cases e.g. propagated from XDP_TX error exceptions.

        [...]
        default:
                err = bpf_warn_invalid_xdp_action(act);
                fallthrough;
        case XDP_ABORTED:
xdp_abort:
                trace_xdp_exception(rq->netdev, prog, act, err);
                fallthrough;
        case XDP_DROP:
                lrstats->xdp_drop++;
                break;
        }
        [...]

  [1] https://lore.kernel.org/netdev/cover.1637924200.git.pabeni@xxxxxxxxxx/

So overall wrt this series: from the lrstats we'd be /dropping/ the pass,
tx_errors, redirect_errors, invalid, aborted counters. And we'd be /keeping/
bytes & packets counters that XDP sees, (driver-)successful tx & redirect
counters as well as drop counter. Also, XDP bytes & packets counters should
not be counted twice wrt ethtool stats.

   [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e2ee5c7e7c35d195e2aa0692a7241d47a433d1e

Concrete example with mlx5:

For most other hardware (than mlx5) I experience that XDP_TX creates a push-back on NIC RX-handing speed. Thus, the XDP_TX stats recorded by BPF-prog is usually correct.

With mlx5 hardware (tested on ConnectX-5 Ex MT28800) the RX packets-per-sec (pps) stats can easily be faster than actually XDP_TX transmitted frames.

$ sudo ./xdp_rxq_info --dev mlx5p1 --action XDP_TX
[...]
Running XDP on dev:mlx5p1 (ifindex:10) action:XDP_TX options:swapmac
XDP stats CPU pps issue-pps
XDP-RX CPU 1 13,922,430 0
XDP-RX CPU total 13,922,430

RXQ stats RXQ:CPU pps issue-pps
rx_queue_index 1:1 13,922,430 0
rx_queue_index 1:sum 13,922,430

The real xmit speed is (from below ethtool_stats.pl) is
9,391,314 pps <= rx1_xdp_tx_xmit /sec

The dropped packets are double accounted as:
4,552,033 <= rx_xdp_drop /sec
4,552,033 <= rx_xdp_tx_full /sec


Show adapter(s) (mlx5p1) statistics (ONLY that changed!)
Ethtool(mlx5p1 ) stat: 217865 ( 217,865) <= ch1_poll /sec
Ethtool(mlx5p1 ) stat: 217864 ( 217,864) <= ch_poll /sec
Ethtool(mlx5p1 ) stat: 13943371 ( 13,943,371) <= rx1_cache_reuse /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx1_xdp_drop /sec
Ethtool(mlx5p1 ) stat: 146740 ( 146,740) <= rx1_xdp_tx_cqes /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx1_xdp_tx_full /sec
Ethtool(mlx5p1 ) stat: 9391314 ( 9,391,314) <= rx1_xdp_tx_inlnw /sec
Ethtool(mlx5p1 ) stat: 880436 ( 880,436) <= rx1_xdp_tx_mpwqe /sec
Ethtool(mlx5p1 ) stat: 997833 ( 997,833) <= rx1_xdp_tx_nops /sec
Ethtool(mlx5p1 ) stat: 9391314 ( 9,391,314) <= rx1_xdp_tx_xmit /sec
Ethtool(mlx5p1 ) stat: 45095173 ( 45,095,173) <= rx_64_bytes_phy /sec
Ethtool(mlx5p1 ) stat: 2886090490 ( 2,886,090,490) <= rx_bytes_phy /sec
Ethtool(mlx5p1 ) stat: 13943293 ( 13,943,293) <= rx_cache_reuse /sec
Ethtool(mlx5p1 ) stat: 31151957 ( 31,151,957) <= rx_out_of_buffer /sec
Ethtool(mlx5p1 ) stat: 45095158 ( 45,095,158) <= rx_packets_phy /sec
Ethtool(mlx5p1 ) stat: 2886072350 ( 2,886,072,350) <= rx_prio0_bytes /sec
Ethtool(mlx5p1 ) stat: 45094878 ( 45,094,878) <= rx_prio0_packets /sec
Ethtool(mlx5p1 ) stat: 2705707938 ( 2,705,707,938) <= rx_vport_unicast_bytes /sec
Ethtool(mlx5p1 ) stat: 45095129 ( 45,095,129) <= rx_vport_unicast_packets /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx_xdp_drop /sec
Ethtool(mlx5p1 ) stat: 146739 ( 146,739) <= rx_xdp_tx_cqe /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx_xdp_tx_full /sec
Ethtool(mlx5p1 ) stat: 9391319 ( 9,391,319) <= rx_xdp_tx_inlnw /sec
Ethtool(mlx5p1 ) stat: 880436 ( 880,436) <= rx_xdp_tx_mpwqe /sec
Ethtool(mlx5p1 ) stat: 997831 ( 997,831) <= rx_xdp_tx_nops /sec
Ethtool(mlx5p1 ) stat: 9391319 ( 9,391,319) <= rx_xdp_tx_xmit /sec
Ethtool(mlx5p1 ) stat: 601044221 ( 601,044,221) <= tx_bytes_phy /sec
Ethtool(mlx5p1 ) stat: 9391316 ( 9,391,316) <= tx_packets_phy /sec
Ethtool(mlx5p1 ) stat: 601040871 ( 601,040,871) <= tx_prio0_bytes /sec
Ethtool(mlx5p1 ) stat: 9391264 ( 9,391,264) <= tx_prio0_packets /sec
Ethtool(mlx5p1 ) stat: 563478483 ( 563,478,483) <= tx_vport_unicast_bytes /sec
Ethtool(mlx5p1 ) stat: 9391316 ( 9,391,316) <= tx_vport_unicast_packets /sec



[1] https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl


The net_devices stats says the NIC is processing zero packets:

$ sar -n DEV 2 1000
[...]
Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
[...]
Average: mlx5p1 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
Average: mlx5p2 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00