Re:
From: Jacob Keller
Date: Mon Feb 02 2026 - 19:34:45 EST
On 2/2/2026 2:53 AM, Anshumali Gaur wrote:
On 2026-01-29 at 23:02:43, Jacob Keller (jacob.e.keller@xxxxxxxxx) wrote:
Hi Jacob,
On 1/29/2026 1:19 AM, Anshumali Gaur wrote:
When both AF and PF drivers are built as modules, the PF driver in the
kexec kernel may probe before the AF driver is ready. This leads to
a crash due to uninitialized hardware state.
This patch ensures the PF driver properly detects and waits for AF
driver readiness before proceeding with initialization.
To me, the patch description is not sufficient to describe the what and why
of this change.
Could you please provide a better explanation of how the addition of the
provided shutdown handler fixes initialization?
The issue being addressed here is specific to kexec and persistent AF
hardware state across kernel transitions. When both AF and PF drivers
are built as modules and a kexec kernel is performed, the PF driver in
the new kernel may probe before the AF driver has completed probing and
reinitializing the RVU hardware. In this scenario, the hardware state
left behind by the AF driver in the old kernel is still visible to the
PF driver in the new kernel resulting in crash due to stale state.
The RVUM block revision field acts as an implicit indication that the AFFixes: 54494aa5d1e6 ("octeontx2-af: Add Marvell OcteonTX2 RVU AF driver")
Signed-off-by: Anshumali Gaur <agaur@xxxxxxxxxxx>
---
drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 747fbdf2a908..8530df8b3fda 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -3632,11 +3632,22 @@ static void rvu_remove(struct pci_dev *pdev)
devm_kfree(&pdev->dev, rvu);
}
+static void rvu_shutdown(struct pci_dev *pdev)
+{
+ struct rvu *rvu = pci_get_drvdata(pdev);
+
+ if (!rvu)
+ return;
+
+ rvu_clear_rvum_blk_revid(rvu);
Here, I guess you are clearing some data about the device status. Does that
mean that when you initialize later you will wait for the AF driver to
finish probing and configure this? It would be nice to explain how this
change fixes initialization.
driver has completed its initialization. If this value is left uncleared
during kexec kernel booting, the PF driver may observe a non-zero/valid
RVUM block revision and incorrectly assume that the AF is already
initialized and ready, even though the AF driver in the kexec kernel has
not yet probed. This leads to PF initialization proceeding against
partially initialized hardware, resulting in a crash.
Makes sense. When shutting down you need to explicitly clear the stale data so that booting up (without a powercycle as in the kexec case) does not lead to stale data.
I'd appreciate a little more of this detail in the commit message personally. However, functionally it makes sense, so:
Reviewed-by: Jacob Keller <jacob.e.keller@xxxxxxxxx>