Re: [PATCH v2 1/3] EDAC/versalnet: Fix teardown ordering in mc_remove()

From: Prasanna Kumar T S M

Date: Mon Apr 06 2026 - 01:26:31 EST

On 03-04-2026 16:04, Borislav Petkov wrote:

On Wed, Apr 01, 2026 at 04:18:36AM -0700, Prasanna Kumar T S M wrote:

The teardown sequence in mc_remove() does not mirror the reverse of the
initialization order in mc_probe(). In particular,
unregister_rpmsg_driver() is called before remove_versalnet(), and
cdx_mcdi_finish() is called after rproc_shutdown().

Reorder mc_remove() to reverse the probe initialization sequence,
consistent with the probe error-unwind paths.

The rproc reference acquired via rproc_get_by_phandle() during probe
is not released in mc_remove(), causing a reference count leak. Add
the missing rproc_put() call.

Fixes: d5fe2fec6c40 ("EDAC: Add a driver for the AMD Versal NET DDR controller")
Signed-off-by: Prasanna Kumar T S M <ptsm@xxxxxxxxxxxxxxxxxxx>
---
drivers/edac/versalnet_edac.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

Sashiko has found things, pls addres them:

https://sashiko.dev/#/patchset/20260401111836.2342918-1-ptsm%40linux.microsoft.com

I asked AI to validate Sashiko's comment. This is its output
------------------------------------------------------------

Analysis

The review comment is not valid (false positive). Here's the detailed reasoning:

Data structure hierarchy

mc_priv
└── mcdi: struct cdx_mcdi * ← kfree'd at step 6 (LAST)
├── mcdi: struct cdx_mcdi_data * ← kfree'd by cdx_mcdi_finish() at step 2
│ └── iface: struct cdx_mcdi_iface (mutex, cmd list, etc.)
├── r5_rproc
└── ...

New mc_remove() order (after commit)

(1) remove_versalnet(priv); // tear down EDAC
(2) cdx_mcdi_finish(priv->mcdi); // wait_for_cleanup → kfree(cdx->mcdi) → cdx->mcdi = NULL
(3) unregister_rpmsg_driver(&amd_rpmsg_driver); // destroys rpmsg endpoint
(4) rproc_shutdown(priv->mcdi->r5_rproc);
(5) rproc_put(priv->mcdi->r5_rproc);
(6) kfree(priv->mcdi); // frees outer struct

Why the concern doesn't hold

The reviewer's premise is wrong. The comment says "after priv->mcdi is freed" — but priv->mcdi (the struct cdx_mcdi * pointer passed to cdx_mcdi_process_cmd) is not freed
until step 6, well after unregister_rpmsg_driver() at step 3.

What IS freed at step 2 is the inner cdx->mcdi (struct cdx_mcdi_data *). But cdx_mcdi_finish() also sets cdx->mcdi = NULL. If rpmsg_cb() fires between steps 2 and 3:

1. mc_priv->mcdi → still points to valid struct cdx_mcdi (not freed until step 6)
2. cdx_mcdi_process_cmd(mc_priv->mcdi, ...) is called with a valid pointer
3. Inside, cdx_mcdi_if(cdx) checks cdx->mcdi → NULL → returns NULL
4. if (!mcdi) return; → exits early, safe

Additionally, cdx_mcdi_wait_for_cleanup() inside cdx_mcdi_finish() ensures all pending MCDI commands have completed before freeing, so no legitimate MCDI responses should
arrive afterward.

This ordering exactly matches the probe error unwind path (err_init: does cdx_mcdi_finish → kfree → unregister_rpmsg_driver → rproc_shutdown → rproc_put), which the
commit explicitly states it's aligning with.

Verdict: False positive. The NULL guard in cdx_mcdi_if() / cdx_mcdi_process_cmd() protects the window, and the outer struct priv->mcdi remains valid throughout.

------------------------------------------------------------

So the comment looks like a false positive. It will be great if someone from AMD verifies this.

Can you please look at the other 2 patches in this series? They are independent of this change.

Thanks,
Prasanna Kumar