Re: [PATCH v4 net 4/4] net: enetc: disable NAPI after all rings are disabled

From: Vladimir Oltean
Date: Fri Oct 11 2024 - 10:42:28 EST


On Thu, Oct 10, 2024 at 05:20:56PM +0800, Wei Fang wrote:
> When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
> on LS1028A, it was found that if the command was re-run multiple times,
> Rx could not receive the frames, and the result of xdp-bench showed
> that the rx rate was 0.
>
> root@ls1028ardb:~# ./xdp-bench tx eno0
> Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
> Summary 2046 rx/s 0 err,drop/s
> Summary 0 rx/s 0 err,drop/s
> Summary 0 rx/s 0 err,drop/s
> Summary 0 rx/s 0 err,drop/s
>
> By observing the Rx PIR and CIR registers, CIR is always 0x7FF and
> PIR is always 0x7FE, which means that the Rx ring is full and can no
> longer accommodate other Rx frames. Therefore, the problem is caused
> by the Rx BD ring not being cleaned up.
>
> Further analysis of the code revealed that the Rx BD ring will only
> be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
> Therefore, some debug logs were added to the driver and the current
> values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
> BD ring was full. The logs are as follows.
>
> [ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
> [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
> [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
>
> From the results, the max value of xdp_tx_in_flight has reached 2140.
> However, the size of the Rx BD ring is only 2048. So xdp_tx_in_flight
> did not drop to 0 after enetc_stop() is called and the driver does not
> clear it. The root cause is that NAPI is disabled too aggressively,
> without having waited for the pending XDP_TX frames to be transmitted,
> and their buffers recycled, so that xdp_tx_in_flight cannot naturally
> drop to 0. Later, enetc_free_tx_ring() does free those stale, unsent
> XDP_TX packets, but it is not coded up to also reset xdp_tx_in_flight,
> hence the manifestation of the bug.
>
> One option would be to cover this extra condition in enetc_free_tx_ring(),
> but now that the ENETC_TX_DOWN exists, we have created a window at
> the beginning of enetc_stop() where NAPI can still be scheduled, but
> any concurrent enqueue will be blocked. Therefore, enetc_wait_bdrs()
> and enetc_disable_tx_bdrs() can be called with NAPI still scheduled,
> and it is guaranteed that this will not wait indefinitely, but instead
> give us an indication that the pending TX frames have orderly dropped
> to zero. Only then should we call napi_disable().
>
> This way, enetc_free_tx_ring() becomes entirely redundant and can be
> dropped as part of subsequent cleanup.
>
> The change also refactors enetc_start() so that it looks like the
> mirror opposite procedure of enetc_stop().
>
> Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet processing")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Wei Fang <wei.fang@xxxxxxx>
> ---
> v2 changes:
> 1. Modify the titile and rephrase the commit meesage.
> 2. Use the new solution as described in the title
> v3: no changes.
> v4 changes:
> 1. Modify the title and rephrase the commit message.
> ---

Reviewed-by: Vladimir Oltean <vladimir.oltean@xxxxxxx>
Tested-by: Vladimir Oltean <vladimir.oltean@xxxxxxx>