Re: [PATCH iwl-next v8 07/11] igc: add support for frame preemption verification

From: Vladimir Oltean
Date: Wed Mar 05 2025 - 19:28:46 EST


On Wed, Mar 05, 2025 at 08:00:22AM -0500, Faizal Rahim wrote:
> b) configure_pmac() -> not used
> - this callback dynamically controls pmac_enabled at runtime. For
> example, mmsv calls configure_pmac() and disables pmac_enabled when
> the link partner goes down, even if the user previously enabled it.
> The intention is to save power but it is not feasible in igc
> because it causes an endless adapter reset loop:
>
> 1) Board A and Board B complete the verification handshake. Tx mode
> register for both boards are in TSN mode.
> 2) Board B link goes down.
>
> On Board A:
> 3) mmsv calls configure_pmac() with pmac_enabled = false.
> 4) configure_pmac() in igc updates a new field based on pmac_enabled.
> Driver uses this field in igc_tsn_new_flags() to indicate that the
> user enabled/disabled FPE.
> 5) configure_pmac() in igc calls igc_tsn_offload_apply() to check
> whether an adapter reset is needed. Calls existing logic in
> igc_tsn_will_tx_mode_change() and igc_tsn_new_flags().
> 6) Since pmac_enabled is now disabled and no other TSN feature is
> active, igc_tsn_will_tx_mode_change() evaluates to true because Tx
> mode will switch from TSN to Legacy.
> 7) Driver resets the adapter.
> 8) Registers are set, and Tx mode switches to Legacy.
> 9) When link partner is up, steps 3–8 repeat, but this time with
> pmac_enabled = true, reactivating TSN.
> igc_tsn_will_tx_mode_change() evaluates to true again, since Tx
> mode will switch from Legacy to TSN.
> 10) Driver resets the adapter.
> 11) Rest adapter completes, registers are set, and Tx mode switches to

s/Rest adapter/Adapter reset/

> TSN.
>
> On Board B:
> 12) Adapter reset on Board A at step 10 causes it to detect its link
> partner as down.
> 13) Repeats steps 3–8.
> 14) Once reset adapter on Board A is completed at step 11, it detects
> its link partner as up.
> 15) Repeats steps 9–11.
>
> - this cycle repeats indefinitely. To avoid this issue, igc only uses
> mmsv.pmac_enabled to track whether FPE is enabled or disabled.
>
> Co-developed-by: Vinicius Costa Gomes <vinicius.gomes@xxxxxxxxx>
> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@xxxxxxxxx>
> Co-developed-by: Choong Yong Liang <yong.liang.choong@xxxxxxxxxxxxxxx>
> Signed-off-by: Choong Yong Liang <yong.liang.choong@xxxxxxxxxxxxxxx>
> Co-developed-by: Chwee-Lin Choong <chwee.lin.choong@xxxxxxxxx>
> Signed-off-by: Chwee-Lin Choong <chwee.lin.choong@xxxxxxxxx>
> Signed-off-by: Faizal Rahim <faizal.abdul.rahim@xxxxxxxxxxxxxxx>
> ---
> +static inline bool igc_fpe_is_pmac_enabled(struct igc_adapter *adapter)
> +{
> + return static_branch_unlikely(&igc_fpe_enabled) &&
> + adapter->fpe.mmsv.pmac_enabled;
> +}
> +
> +static inline bool igc_fpe_is_verify_or_response(union igc_adv_rx_desc *rx_desc,
> + unsigned int size, void *pktbuf)
> +{
> + u32 status_error = le32_to_cpu(rx_desc->wb.upper.status_error);
> + static const u8 zero_payload[SMD_FRAME_SIZE] = {0};
> + int smd;
> +
> + smd = FIELD_GET(IGC_RXDADV_STAT_SMD_TYPE_MASK, status_error);
> +
> + return (smd == IGC_RXD_STAT_SMD_TYPE_V || smd == IGC_RXD_STAT_SMD_TYPE_R) &&
> + size == SMD_FRAME_SIZE &&
> + !memcmp(pktbuf, zero_payload, SMD_FRAME_SIZE); /* Buffer is all zeros */

Using this definition...

> +}
> +
> +static inline void igc_fpe_lp_event_status(union igc_adv_rx_desc *rx_desc,
> + struct ethtool_mmsv *mmsv)
> +{
> + u32 status_error = le32_to_cpu(rx_desc->wb.upper.status_error);
> + int smd;
> +
> + smd = FIELD_GET(IGC_RXDADV_STAT_SMD_TYPE_MASK, status_error);
> +
> + if (smd == IGC_RXD_STAT_SMD_TYPE_V)
> + ethtool_mmsv_event_handle(mmsv, ETHTOOL_MMSV_LP_SENT_VERIFY_MPACKET);
> + else if (smd == IGC_RXD_STAT_SMD_TYPE_R)
> + ethtool_mmsv_event_handle(mmsv, ETHTOOL_MMSV_LP_SENT_RESPONSE_MPACKET);
> +}
> @@ -2617,6 +2617,15 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
> size -= IGC_TS_HDR_LEN;
> }
>
> + if (igc_fpe_is_pmac_enabled(adapter) &&
> + igc_fpe_is_verify_or_response(rx_desc, size, pktbuf)) {

... invalid SMD-R and SMD-V frames will skip this code block altogether, and
will be passed up the network stack, and visible at least in tcpdump, correct?
Essentially, if the link partner would craft an ICMP request packet with
an SMD-V or SMD-R, your station would respond to it, which is incorrect.

A bit strange, the behavior in this case seems a bit under-specified in
the standard, and I don't see any counter that should be incremented.

> + igc_fpe_lp_event_status(rx_desc, &adapter->fpe.mmsv);
> + /* Advance the ring next-to-clean */
> + igc_is_non_eop(rx_ring, rx_desc);
> + cleaned_count++;
> + continue;
> + }

To fix this, don't you want to merge the unnaturally split
igc_fpe_is_verify_or_response() and igc_fpe_lp_event_status() into a
single function, which returns true whenever the mPacket should be
consumed by the driver, but decides whether to emit a mmsv event on its
own? Merging the two would also avoid reading rx_desc->wb.upper.status_error
twice.

Something like this:

static inline bool igc_fpe_handle_mpacket(struct igc_adapter *adapter,
union igc_adv_rx_desc *rx_desc,
unsigned int size, void *pktbuf)
{
u32 status_error = le32_to_cpu(rx_desc->wb.upper.status_error);
int smd;

smd = FIELD_GET(IGC_RXDADV_STAT_SMD_TYPE_MASK, status_error);
if (smd != IGC_RXD_STAT_SMD_TYPE_V && smd != IGC_RXD_STAT_SMD_TYPE_R)
return false;

if (size == SMD_FRAME_SIZE && mem_is_zero(pktbuf, SMD_FRAME_SIZE)) {
struct ethtool_mmsv *mmsv = &adapter->fpe.mmsv;
enum ethtool_mmsv_event event;

if (smd == IGC_RXD_STAT_SMD_TYPE_V)
event = ETHTOOL_MMSV_LP_SENT_VERIFY_MPACKET;
else
event = ETHTOOL_MMSV_LP_SENT_RESPONSE_MPACKET;

ethtool_mmsv_event_handle(mmsv, event);
}

return true;
}

if (igc_fpe_is_pmac_enabled(adapter) &&
igc_fpe_handle_mpacket(adapter, rx_desc, size, pktbuf)) {
/* Advance the ring next-to-clean */
igc_is_non_eop(rx_ring, rx_desc);
cleaned_count++;
continue;
}

[ also remark the use of mem_is_zero() instead of memcmp() with a buffer
pre-filled with zeroes. It should be more efficient, for the simple
reason that it's accessing a single memory buffer and not two. Though
I'm surprised how widespread the memcmp() pattern is throughout the
kernel. ]