On Mon, Dec 16, 2024 at 01:47:16AM -0500, Faizal Rahim wrote:
Created fpe_t struct to store MAC Merge data and implement the
"ethtool --set-mm" callback. The fpe_t struct will host other frame
preemption related data in future patches.
The following fields are used to set IGC register:
a) pmac_enabled -> TQAVCTRL.PREEMPT_ENA
This global register sets the preemption scheme, controlling
preemption capabilities in transmit and receive directions, as well as
the verification handshake capability.
I'm sorry, I'm not able to mentally translate this explanation into
something technical. Which capabilities are we talking about, that this
bit controls? I'm not clear what it does. The kernel-doc description of
pmac_enabled is much more succinct (and at the same time, appears to
contradict this much more elaborate yet unclear description).
b) tx_min_frag_size -> TQAVCTRL.MIN_FRAG
Global register to set minimum fragments.
When you say "global register", you mean global as opposed to what?
Per station interface?
I initially stop execution, but then I figured if the other parameters are set correctly, the feature could still work since this field has a valid default value set.diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c
index 817838677817..1954561ec4aa 100644
--- a/drivers/net/ethernet/intel/igc/igc_ethtool.c
+++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c
@@ -8,6 +8,7 @@
#include "igc.h"
#include "igc_diag.h"
+#include "igc_tsn.h"
/* forward declaration */
struct igc_stats {
@@ -1781,6 +1782,34 @@ static int igc_ethtool_set_eee(struct net_device *netdev,
return 0;
}
+static int igc_ethtool_set_mm(struct net_device *netdev,
+ struct ethtool_mm_cfg *cmd,
+ struct netlink_ext_ack *extack)
+{
+ struct igc_adapter *adapter = netdev_priv(netdev);
+ struct fpe_t *fpe = &adapter->fpe;
+
+ if (cmd->tx_min_frag_size < IGC_TX_MIN_FRAG_SIZE ||
+ cmd->tx_min_frag_size > IGC_TX_MAX_FRAG_SIZE)
+ NL_SET_ERR_MSG_MOD(extack,
+ "Invalid value for tx-min-frag-size");
Shouldn't the execution actually stop here with an error code?
+ else
+ fpe->verify_time = cmd->verify_time;
+
+ fpe->tx_enabled = cmd->tx_enabled;
+ fpe->pmac_enabled = cmd->pmac_enabled;
+ fpe->verify_enabled = cmd->verify_enabled;
+
+ return igc_tsn_offload_apply(adapter);
hmm, igc_tsn_offload_apply() is a function which always returns zero.
It seems more natural to make it return void.
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c
index 5cd54ce435b9..b968c02f5fee 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.c
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.c
@@ -194,12 +210,22 @@ static void igc_tsn_set_retx_qbvfullthreshold(struct igc_adapter *adapter)
wr32(IGC_RETX_CTL, retxctl);
}
+static u8 igc_fpe_get_frag_size_mult(const struct fpe_t *fpe)
+{
+ u32 tx_min_frag_size = fpe->tx_min_frag_size;
+ u8 mult = (tx_min_frag_size / 64) - 1;
+
+ return clamp_t(u8, mult, IGC_MIN_FOR_TX_MIN_FRAG,
+ IGC_MAX_FOR_TX_MIN_FRAG);
+}
If you translate the continuous range of TX fragment sizes into
discrete multipliers because that's what the hardware works with, why
don't you just reject the non-multiple values using
ethtool_mm_frag_size_min_to_add(), and at the same time use the output
of that function directly to obtain your multiplier? IIUC it gets you
the same result.
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.h b/drivers/net/ethernet/intel/igc/igc_tsn.h
index 98ec845a86bf..08e7582f257e 100644
--- a/drivers/net/ethernet/intel/igc/igc_tsn.h
+++ b/drivers/net/ethernet/intel/igc/igc_tsn.h
@@ -4,6 +4,15 @@
#ifndef _IGC_TSN_H_
#define _IGC_TSN_H_
+/* IGC_TX_MIN_FRAG_SIZE is based on the MIN_FRAG field in Section 8.12.2 of the
+ * SW User Manual.
+ */
+#define IGC_TX_MIN_FRAG_SIZE 68
+#define IGC_TX_MAX_FRAG_SIZE 260
Odd. Is there a link to this manual (for I225 I suppose)? Standard values are 60, 124, 188, 252.
Maybe the methodology for calculating these is used here? As things stand,
if the driver reports these values, IIUC, openlldp gets confused and communicates
a LLDP_8023_ADD_ETH_CAPS_ADD_FRAG_SIZE value to the link partner which is higher
than it could have been (68 is rounded up to the next standard TX fragment size,
which is 124). So the link partner will preempt in larger chunks and this will
not reduce latency as much.
Regarding the TX minimum fragment size, I’m currently looking into it. After
speaking with the i226 hardware owners (Shalev Avi), I realized I may have
misunderstood the fragment size details. Avi mentioned that the i226
supports fragment sizes of 64, 128, 192, and 256 bytes (without mCRC).
However, these values are still 4 bytes larger than the standard you
mentioned.
Yes, the standard I mention is 802.1Q section 99.4.4 Transmit processing:
The earliest starting position of preemption is controlled by the addFragSize
variable. Preemption does not occur until at least 64 x (1 + addFragSize) – 4
octets of the preemptable frame have been sent. The addFragSize variable is set
to the value of the addFragSize field in the received Additional Ethernet
Capabilities TLV (see 79.3.7).
The preemptableFragSize state machine variable says that the MAC can
preempt as soon as enough octets have been put on to wire so as to have
a fixed minimum length (by default, a valid Ethernet packet). You're
saying that the i226 MAC performs preemption at least 4 octets later
than the standard says it could.
Avi provided the following explanation for these sizes:
"The i226 always performs preemption on 16-byte boundaries. Once we read a
16-byte line from the internal packet buffer memory, we transmit it entirely
before deciding on preemption. This avoids keeping partial bytes from the
16-byte line, ensuring that we continue with the preempted frame only after
finishing the current 16-byte line."
User space assumes that, when it gets an addFragSize over LLDP from the
link partner, it can program the local transmitter to preempt at that
fragment size, and that the operation will succeed. The formula to
translate from addFragSize to preemptableFragSize is standard. There is
no mechanism built into the UAPI for user space to guess, if programming
a standard value failed, what other custom value could work.
I guess that leaves 2 options:
(1) accept and report the standard values, but "secretly" preempt later
than expected
(2) accept any values in the discrete range, and round them up to the
first value supported by the NIC, then report that non-standard
value. Then keep this as a quirk isolated to the current generation
of NICs, and be better with new drivers which accept standard values
and don't do any rounding.
I think I prefer seeing the latter variant. I'm trying to think if this
is going to cause problems with openlldp or with the lldp_change_add_frag_size()
selftest, but I think it's generally safe. That test only checks that
LLDP reacts to the addFragSize of the link partner by programming its
local addFragSize to the same value. It doesn't check that the
tx-min-frag-size is exactly equal to the formula derived from that
addFragSize.