Re: [PATCH net 1/2] net/mlx5e: psp: Fix invalid access on PSP dev registration fail
From: Cosmin Ratiu
Date: Tue Apr 21 2026 - 08:31:33 EST
On Mon, 2026-04-20 at 10:09 -0700, Jakub Kicinski wrote:
> On Mon, 20 Apr 2026 10:30:46 +0000 Cosmin Ratiu wrote:
> > > When psp_dev_create() fails, this function now returns without
> > > setting
> > > psp->psp, leaving it as NULL. However, priv->psp remains
> > > allocated
> > > and
> > > non-NULL.
> > >
> > > Does this leave the RX datapath vulnerable to a NULL pointer
> > > dereference?
> > >
> > > If priv->psp is non-NULL, the NIC RX initialization path can
> > > still
> > > call
> > > mlx5_accel_psp_fs_init_rx_tables(), which creates hardware flow
> > > steering
> > > rules to intercept UDP traffic.
> > >
> > > If a UDP packet triggers these rules, the hardware flags the CQE
> > > with
> > > MLX5E_PSP_MARKER_BIT. The RX fast-path sees the marker and
> > > invokes
> > > mlx5e_psp_offload_handle_rx_skb(), which dereferences the pointer
> > > unconditionally:
> > >
> > > u16 dev_id = priv->psp->psp->id;
> > >
> > > Since priv->psp->psp is NULL, this will cause a kernel panic.
> > > Should
> > > priv->psp be cleaned up, or the error propagated, to prevent flow
> > > rules
> > > from being installed when registration fails?
> >
> > First, this is preexisting. But more importantly, it's impossible
> > to
> > trigger:
> > - with no PSP devs, there can be no PSP SAs installed.
> > - with no SAs, PSP decryption cannot succeed.
> > - all unsuccessfully decrypted PSP packets are dropped by steering.
> > - the RX handler will not see any PSP packets with the marker set.
> >
> > This patch fixes the comparatively way more likely scenario of
> > psp_dev_register failing and then mlx5e_psp_unregister passing the
> > error pointer to psp_dev_unregister, which will do unpleasant
> > things
> > with it.
>
> Sure but why are you leaving the priv->psp struct in place and
> whatever
> FS init has been done? IOW if you really want PSP init to not block
> probe why is mlx5e_psp_register() a void function rather than
> mlx5e_psp_init() ? Ignoring errors from psp_dev_create()
> makes no sense to me - what are you protecting from?
> kmalloc(GFP_KERNEL)
> failing?
priv->psp and steering at the time of mlx5e_psp_register() is inert
without the PSP device. Cleaning it on psp_dev_create() failure would
be weird, it's cleaned up anyway on netdev teardown. The fact that only
memory allocations can fail inside psp_dev_create() is irrelevant here.
psp_dev_create() failing shouldn't bring down the whole netdevice, so
logging a message and continuing is ok (which is what is also done for
macsec and ktls).
mlx5e_psp_register() is void because it's called from
mlx5e_nic_enable() which can't fail, so it really can't do much other
than complain to dmesg.
But while thinking about this, I suppose we could change the entire PSP
initialization to happen at the time of the current
mlx5e_psp_register(), and that would simplify the number of states.
I will do that in the next planned PSP series for net-next.
Meanwhile, could you please take the 2nd patch and leave this one out?
It should apply with no conflicts by itself.
Or you would like to see a separate submission with the 2nd patch
alone?
Cosmin.