Re: [PATCH v20 net-next 9/9] octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically.

From: Ratheesh Kannoth

Date: Wed Jun 10 2026 - 00:42:14 EST


On 2026-06-09 at 09:34:53, Ratheesh Kannoth (rkannoth@xxxxxxxxxxx) wrote:
> Replace the file-scope static npc_priv with a kcalloc'd struct filled
> from hardware bank/subbank geometry at init (num_banks is no longer a
> const compile-time constant; drop init_done and use a non-NULL
> npc_priv pointer for liveness). Thread npc_priv_get() / pointer access
> through the CN20K NPC code paths, extend teardown to kfree the root
> struct on failure and in npc_cn20k_deinit, and adjust MCAM section
> setup to use the discovered subbank count.
>
> Allocate MCAM debugfs dstats via devm_kzalloc instead of a static matrix,
> and use the allocated backing store consistently when computing deltas
> (including the counter rollover compare).
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@xxxxxxxxxxx>

https://sashiko.dev/#/patchset/20260609040453.711932-1-rkannoth%40marvell.com says:

> @@ -397,6 +398,10 @@ int npc_cn20k_debugfs_init(struct rvu *rvu)
>> debugfs_create_file("vidx2idx", 0444, rvu->rvu_dbg.npc,
>> npc_priv, &npc_vidx2idx_map_fops);
>>
>> + dstats = devm_kzalloc(rvu->dev, sizeof(*dstats), GFP_KERNEL);
>> + if (!dstats)
>> + return -ENOMEM;
>Does allocating this memory dynamically per-device but assigning it to a
>static global pointer break multi-device support?

Exist only one AF device per Soc

>If there are multiple RVU devices, the second device will overwrite the
>global dstats pointer, which leaks the devm-managed memory of the first
>device.
No.

>Also, because it uses devm_kzalloc(), if the specific device unbinds, the
>global pointer is freed and left dangling, potentially causing a
>Use-After-Free for the remaining active devices.

Next rvu probe() would allocate this memory again during probe(). Use will happen only after allocation.
As part of hardening series (post merge of this series), will plan to move this
pointer to "rvu" structure.

>> /* mcam_idx should be less than (2 * bank depth) */
>> - if (mcam_idx >= npc_priv.bank_depth * 2) {
>> + if (mcam_idx >= npc_priv->bank_depth * 2) {
>Can npc_priv be NULL here?
>If a mailbox message is processed before npc_cn20k_init() fully allocates
>npc_priv, or if initialization failed and left it NULL, this will cause a
>kernel panic. Should there be a check to ensure npc_priv is allocated
>before dereferencing?

mbox message is from PF/VF. PF/VF driver need AF driver to intialize properly. if npc_priv is NULL,
AF driver probe will fail as npc_priv is allocated during AF driver probe.

>> @@ -1825,37 +1823,37 @@ int npc_mcam_idx_2_subbank_idx(struct rvu *rvu, u16 mcam_idx,
>> int bank_off, sb_id;
>>
>> /* mcam_idx should be less than (2 * bank depth) */
>> - if (mcam_idx >= npc_priv.bank_depth * 2) {
>> + if (mcam_idx >= npc_priv->bank_depth * 2) {
>Can this crash if npc_priv is NULL here?
>Similar to the above, unconditionally dereferencing npc_priv here might
>lead to a NULL pointer dereference.
Same comment as above

>> @@ -4251,8 +4249,8 @@ int rvu_mbox_handler_npc_get_pfl_info(struct rvu *rvu, struct msg_req *req,
>> return -EOPNOTSUPP;
>> }
>>
>> - rsp->kw_type = npc_priv.kw;
>> - rsp->x4_slots = npc_priv.bank_depth;
>> + rsp->kw_type = npc_priv->kw;
>> + rsp->x4_slots = npc_priv->bank_depth;
>> return 0;
>Is it possible for rvu_mbox_handler_npc_get_pfl_info() to be called before
>npc_priv is successfully allocated?
No

>If this mailbox handler fires while npc_priv is NULL, it will panic. Before
>this patch, accessing the uninitialized static struct safely returned zeros.
Mailbox request is from PF. PF driver will fail if AF is not intialized properly.
npc_priv is allocated during AF driver probe().

>> - npc_priv.num_subbanks = num_subbanks;
>> - npc_priv.bank_depth = bank_depth;
>> - npc_priv.subbank_depth = subbank_depth;
>> + npc_priv = kcalloc(1, sizeof(*npc_priv), GFP_KERNEL);
>Could this overwrite the global npc_priv pointer if multiple RVU devices are
>present?
First patch in the series enforce only one AF device per system.

>
>> + npc_defrag_list_clear();
>> + kfree(npc_priv);
>> + npc_priv = NULL;
>> }
>Will setting the static global npc_priv to NULL during teardown crash other
>active devices?
>If multiple devices are present and one is unbound, it will free the shared
>global npc_priv pointer and set it to NULL. The other active devices will
>then crash the next time they attempt to dereference it.

Same comment as above.