[no subject]
From: Anshumali Gaur
Date: Mon Feb 02 2026 - 06:00:31 EST
On 2026-01-29 at 23:02:43, Jacob Keller (jacob.e.keller@xxxxxxxxx) wrote:
>
>
> On 1/29/2026 1:19 AM, Anshumali Gaur wrote:
> > When both AF and PF drivers are built as modules, the PF driver in the
> > kexec kernel may probe before the AF driver is ready. This leads to
> > a crash due to uninitialized hardware state.
> >
> > This patch ensures the PF driver properly detects and waits for AF
> > driver readiness before proceeding with initialization.
> >
>
> To me, the patch description is not sufficient to describe the what and why
> of this change.
>
> Could you please provide a better explanation of how the addition of the
> provided shutdown handler fixes initialization?
>
Hi Jacob,
The issue being addressed here is specific to kexec and persistent AF
hardware state across kernel transitions. When both AF and PF drivers
are built as modules and a kexec kernel is performed, the PF driver in
the new kernel may probe before the AF driver has completed probing and
reinitializing the RVU hardware. In this scenario, the hardware state
left behind by the AF driver in the old kernel is still visible to the
PF driver in the new kernel resulting in crash due to stale state.
> > Fixes: 54494aa5d1e6 ("octeontx2-af: Add Marvell OcteonTX2 RVU AF driver")
> > Signed-off-by: Anshumali Gaur <agaur@xxxxxxxxxxx>
> > ---
> > drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
> > index 747fbdf2a908..8530df8b3fda 100644
> > --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
> > +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
> > @@ -3632,11 +3632,22 @@ static void rvu_remove(struct pci_dev *pdev)
> > devm_kfree(&pdev->dev, rvu);
> > }
> > +static void rvu_shutdown(struct pci_dev *pdev)
> > +{
> > + struct rvu *rvu = pci_get_drvdata(pdev);
> > +
> > + if (!rvu)
> > + return;
> > +
> > + rvu_clear_rvum_blk_revid(rvu);
>
> Here, I guess you are clearing some data about the device status. Does that
> mean that when you initialize later you will wait for the AF driver to
> finish probing and configure this? It would be nice to explain how this
> change fixes initialization.
>
The RVUM block revision field acts as an implicit indication that the AF
driver has completed its initialization. If this value is left uncleared
during kexec kernel booting, the PF driver may observe a non-zero/valid
RVUM block revision and incorrectly assume that the AF is already
initialized and ready, even though the AF driver in the kexec kernel has
not yet probed. This leads to PF initialization proceeding against
partially initialized hardware, resulting in a crash.
> > +}
> > +
> > static struct pci_driver rvu_driver = {
> > .name = DRV_NAME,
> > .id_table = rvu_id_table,
> > .probe = rvu_probe,
> > .remove = rvu_remove,
> > + .shutdown = rvu_shutdown,
>
> This is the shutdown handler:
> >
> > * @shutdown: Hook into reboot_notifier_list (kernel/sys.c).
> > * Intended to stop any idling DMA operations.
> > * Useful for enabling wake-on-lan (NIC) or changing
> > * the power state of a device before reboot.
> > * e.g. drivers/net/e100.c.
>
> How does this have anything to do with initialization?
>
> > };
> > static int __init rvu_init_module(void)
>
>