Re: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached

From: Li Li

Date: Mon Apr 20 2026 - 13:11:34 EST


On Mon, Apr 20, 2026 at 1:23 AM Kwapulinski, Piotr
<piotr.kwapulinski@xxxxxxxxx> wrote:
>
> >-----Original Message-----
> >From: Intel-wired-lan <intel-wired-lan-bounces@xxxxxxxxxx> On Behalf Of Li Li via Intel-wired-lan
> >Sent: Sunday, April 19, 2026 9:26 PM
> >To: Nguyen, Anthony L <anthony.l.nguyen@xxxxxxxxx>; Kitszel, Przemyslaw <przemyslaw.kitszel@xxxxxxxxx>; David S. Miller <davem@xxxxxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Eric Dumazet <edumazet@xxxxxxxxxx>; intel-wired-lan@xxxxxxxxxxxxxxxx
> >Cc: netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; David Decotigny <decot@xxxxxxxxxx>; Singhai, Anjali <anjali.singhai@xxxxxxxxx>; Samudrala, Sridhar <sridhar.samudrala@xxxxxxxxx>; Brian Vazquez <brianvv@xxxxxxxxxx>; Li Li <boolli@xxxxxxxxxx>; Tantilov, Emil S <emil.s.tantilov@xxxxxxxxx>
> >Subject: [Intel-wired-lan] [PATCH] idpf: do not perform flow ops when netdev is detached
> >
> >Even though commit 2e281e1155fc ("idpf: detach and close netdevs while handling a reset") prevents ethtool -N/-n operations to operate on detached netdevs, we found that out-of-tree workflows like OpenOnload can bypass ethtool core locks and call idpf_set_rxnfc directly during an idpf HW reset. When this happens, we could get kernel crashes like the following:
> >
> >[ 4045.787439] BUG: kernel NULL pointer dereference, address: 0000000000000070 [ 4045.794420] #PF: supervisor read access in kernel mode [ 4045.799580] #PF: error_code(0x0000) - not-present page [ 4045.804739] PGD 0 [ 4045.806772] Oops: Oops: 0000 [#1] SMP NOPTI ...
> >[ 4045.836425] Workqueue: onload-wqueue oof_do_deferred_work_fn [onload] [ 4045.842926] RIP: 0010:idpf_del_flow_steer+0x24/0x170 [idpf] ...
> >[ 4045.946323] Call Trace:
> >[ 4045.948796] <TASK>
> >[ 4045.950915] ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.955293] ? show_trace_log_lvl+0x1b0/0x2f0 [ 4045.959672] ? idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.964142] ? __die_body.cold+0x8/0x12 [ 4045.968000] ? page_fault_oops+0x148/0x160 [ 4045.972117] ? exc_page_fault+0x6f/0x160 [ 4045.976060] ? asm_exc_page_fault+0x22/0x30 [ 4045.980262] ? idpf_del_flow_steer+0x24/0x170 [idpf] [ 4045.985245] idpf_set_rxnfc+0x6f/0x80 [idpf] [ 4045.989535] af_xdp_filter_remove+0x7c/0xb0 [sfc_resource] [ 4045.995069] oo_hw_filter_clear_hwports+0x6f/0xa0 [onload] [ 4046.000589] oo_hw_filter_update+0x65/0x210 [onload] [ 4046.005587] oof_hw_filter_update.constprop.0+0xe7/0x140 [onload] [ 4046.011716] oof_manager_update_all_filters+0xad/0x270 [onload] [ 4046.017671] __oof_do_deferred_work+0x15e/0x190 [onload] [ 4046.023014] oof_do_deferred_work+0x2c/0x40 [onload] [ 4046.028018] oof_do_deferred_work_fn+0x12/0x30 [onload] [ 4046.033277] process_one_work+0x174/0x330 [ 4046.037304] worker_thread+0x246/0x390 [ 4046.041074] ? __pfx_worker_thread+0x10/0x10 [ 4046.045364] kthread+0xf6/0x240 [ 4046.048530] ? __pfx_kthread+0x10/0x10 [ 4046.052297] ret_from_fork+0x2d/0x50 [ 4046.055896] ? __pfx_kthread+0x10/0x10 [ 4046.059664] ret_from_fork_asm+0x1a/0x30 [ 4046.063613] </TASK>
> >
> >To prevent this, we need to add checks in idpf_set_rxnfc and idpf_get_rxnfc to error out if the netdev is already detached.
> >
> >Tested: implemented the following patch to synthetically force idpf into a HW reset:
> >
> >diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> >index 4fc0bb14c5b1..27476d57bcf0 100644
> >--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> >+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
> >@@ -10,6 +10,9 @@
> > #define idpf_tx_buf_next(buf) (*(u32 *)&(buf)->priv)
> > LIBETH_SQE_CHECK_PRIV(u32);
> >
> >+static bool SIMULATE_TX_TIMEOUT;
> >+module_param(SIMULATE_TX_TIMEOUT, bool, 0644);
> >+
> > /**
> > * idpf_chk_linearize - Check if skb exceeds max descriptors per packet
> > * @skb: send buffer
> >@@ -46,6 +49,8 @@ void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
> >
> > adapter->tx_timeout_count++;
> >
> >+ SIMULATE_TX_TIMEOUT = false;
> >+
> > netdev_err(netdev, "Detected Tx timeout: Count %d, Queue %d\n",
> > adapter->tx_timeout_count, txqueue);
> > if (!idpf_is_reset_in_prog(adapter)) { @@ -2225,6 +2230,8 @@ static bool idpf_tx_clean_complq(struct idpf_compl_queue *complq, int budget,
> > goto fetch_next_desc;
> > }
> > tx_q = complq->txq_grp->txqs[rel_tx_qid];
> >+ if (unlikely(SIMULATE_TX_TIMEOUT && (tx_q->idx % 2 == 1)))
> >+ goto fetch_next_desc;
> >
> > /* Determine completion type */
> > ctype = le16_get_bits(tx_desc->common.qid_comptype_gen,
> >diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >index be66f9b2e101..ba5da2a86c15 100644
> >--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> >@@ -8,6 +8,9 @@
> > #include "idpf_virtchnl.h"
> > #include "idpf_ptp.h"
> >
> >+static bool VIRTCHNL_FAILED;
> >+module_param(VIRTCHNL_FAILED, bool, 0644);
> >+
> > /**
> > * struct idpf_vc_xn_manager - Manager for tracking transactions
> > * @ring: backing and lookup for transactions @@ -3496,6 +3499,11 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
> > switch (adapter->state) {
> > case __IDPF_VER_CHECK:
> > err = idpf_send_ver_msg(adapter);
> >+
> >+ if (unlikely(VIRTCHNL_FAILED)) {
> >+ err = -EIO;
> >+ }
> Please remove redundant parenthesis
> Piotr

Hi Piotr,

The block you are commenting on is not part of the patch; it's just a
block of test code in the commit message I used to reproduce the
failures.

Thanks!

>
> >+
> > switch (err) {
> > case 0:
> > /* success, move state machine forward */
> >
> >And tested by writing 1 to /sys/module/idpf/parameters/VIRTCHNL_FAILED
> >and /sys/module/idpf/parameters/SIMULATE_TX_TIMEOUT, and running
> >idpf_get_rxnfc() right after the HW reset.
> >
> >Without the patch: encountered NULL pointer and kernel crash.
> >
> >With the patch: no crashes.
> >
> >Fixes: 2e281e1155fc ("idpf: detach and close netdevs while handling a reset")
> >Signed-off-by: Li Li <boolli@xxxxxxxxxx>
> >---
> > drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> >diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> >index bb99d9e7c65d..8368a7e6a754 100644
> >--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> >+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> >@@ -43,6 +43,9 @@ static int idpf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
> > unsigned int cnt = 0;
> > int err = 0;
> >
> >+ if (!netdev || !netif_device_present(netdev))
> >+ return -ENODEV;
> >+
> > idpf_vport_ctrl_lock(netdev);
> > vport = idpf_netdev_to_vport(netdev);
> > vport_config = np->adapter->vport_config[np->vport_idx];
> >@@ -349,6 +352,9 @@ static int idpf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd) {
> > int ret = -EOPNOTSUPP;
> >
> >+ if (!netdev || !netif_device_present(netdev))
> >+ return -ENODEV;
> >+
> > idpf_vport_ctrl_lock(netdev);
> > switch (cmd->cmd) {
> > case ETHTOOL_SRXCLSRLINS:
> >--
> >2.54.0.rc1.513.gad8abe7a5a-goog