Re: [RFC net-next v4 2/2] net/mlx5e: Add per queue netdev-genl stats

From: Tariq Toukan
Date: Fri Jun 07 2024 - 03:54:13 EST




On 07/06/2024 4:02, Joe Damato wrote:
On Thu, Jun 06, 2024 at 05:19:42PM -0700, Jakub Kicinski wrote:
On Thu, 6 Jun 2024 14:54:40 -0700 Joe Damato wrote:
Compare the values in /proc/net/dev match the output of cli for the same
device, even while the device is down.

Note that while the device is down, per queue stats output nothing
(because the device is down there are no queues):

This part is not true anymore.

It is true with this patch applied and running the command below.
Maybe I should have been more explicit that using cli.py outputs []
when scope = queue, which could be an internal cli.py thing, but
this is definitely true with this patch.

Did you test it and get different results?

To avoid drivers having their own interpretations what "closed" means,
core hides all queues in closed state:

https://elixir.bootlin.com/linux/v6.10-rc1/source/net/core/netdev-genl.c#L582

PTP RQ index is naively assigned to zero:
rq->ix = MLX5E_PTP_CHANNEL_IX;

but this isn't to be used as the stats index.
Today, the PTP-RQ has no matcing rxq in the kernel level.
i.e. turning PTP-RQ on won't add a kernel-level RXQ to the
real_num_rx_queues.
Maybe we better do.
If not, and the current state is kept, the best we can do is let the PTP-RQ
naively contribute its queue-stat to channel 0.

OK, it sounds like the easiest thing to do is just count PTP as
channel 0, so if i == 0, I'll in the PTP stats.

But please see below regarding testing whether or not PTP is
actually enabled or not.

If we can I think we should avoid making queue 0 too special.
If someone configures steering and only expects certain packets on
queue 0 - getting PTP counted there will be a surprise.
I vote to always count it towards base.

I'm OK with reporting PTP RX in base and only in base.

But, that would then leave PTP TX:


Right, currently there's no consistency between the PTP RX/TX accountment in real_num_queues.
I don't want to create more work for you, but IMO in the longterm I should follow it up with a patch that adds PTP-RX to real_num_rx_queues.

PTP TX stats are reported in mlx5e_get_queue_stats_tx because
the user will pass in an 'i' which refers to the PTP txq. This works
fine with the mlx5e_get_queue_stats_tx code as-is because the PTP
txqs are mapped in the new priv->txq2sq_stats array.

However.... if PTP is enabled and then disabled by the user, that
leaves us in this state:

priv->tx_ptp_opened && !test_bit(MLX5E_PTP_STATE_TX, channels.ptp->state)

e.g. PTP TX was opened at some point but is currently disabled as
the bit is unset.

In this case, when the txq2sq_stats map is built, it'll exclude PTP
stats struct from that mapping if MLX5E_PTP_STATE_TX is not set.

So, in this case, the stats have to be reported in base with
something like this (psuedo code):
if (priv->tx_ptp_opened &&
! test_bit(MLX5E_PTP_STATE_TX, channels.ptp->state)) {
for (tc = 0; tc < priv->channels.ptp->num_tc; tc++) {

Do not take num_tc from here, this ptp memory might not exist at this point.
Calculate it the regular way with max_opened_tc.

tx->packets += ...ptp_stats.sq[tc].packets;
tx->bytes += ...ptp_stats.sq[tc].bytes;
}
}

Right? Or am I just way off here?