Re: [PATCH v5 net 04/10] octeontx2-af: npc: cn20k: Fix target map and rule

From: Ratheesh Kannoth

Date: Thu Apr 30 2026 - 00:16:40 EST


On 2026-04-29 at 07:57:16, Ratheesh Kannoth (rkannoth@xxxxxxxxxxx) wrote:
> npc_defrag_move_vdx_to_free() disables, copies, and enables the MCAM entry
> at a new index but previously left entry2target_pffunc[] and the mcam_rules
> list still keyed to the old index. Copy the target PF association to the
> new slot, clear the old one, and retarget the rule entry so software state
> matches the relocated hardware context.
>
> Fixes: 645c6e3c1999 ("octeontx2-af: npc: cn20k: virtual index support")
> Signed-off-by: Ratheesh Kannoth <rkannoth@xxxxxxxxxxx>

@@ -3723,8 +3724,21 @@ int npc_defrag_move_vdx_to_free(struct rvu *rvu,
> mcam->entry2pfvf_map[new_midx] = pcifunc;
> /* Counter is not preserved */
> mcam->entry2cntr_map[new_midx] = new_midx;
> + target_pf = mcam->entry2target_pffunc[old_midx];
>Is mcam->entry2target_pffunc always initialized before we read from it here?
>If it was allocated with kmalloc_array() during driver initialization and no
>target PF was explicitly configured for this MCAM entry, could this read
>uninitialized heap memory and copy it to the new index?

The npc_defrag_move_vdx_to_free function only operates on entries that are currently 'in-use' and
being relocated to optimize space. Because the target PF configuration is a prerequisite for an
entry to be considered active in the MCAM pipeline, these fields are guaranteed to be
populated by the time a defrag move is triggered.

> + mcam->entry2target_pffunc[new_midx] = target_pf;
> + mcam->entry2target_pffunc[old_midx] = NPC_MCAM_INVALID_MAP;
> +
> npc_mcam_set_bit(mcam, new_midx);
>
> + /* Note: list order is not functionally required for mcam_rules */
> + list_for_each_entry_safe(rule, tmp, &mcam->mcam_rules, list) {
> + if (rule->entry != old_midx)
> + continue;
> +
> + rule->entry = new_midx;
>If npc_add_to_pf_maps() fails earlier in this loop, the function returns an
>error without updating these software tracking maps. At that failure point,
>the hardware rule at old_midx has already been disabled and its subbank marked
>as free via __npc_subbank_free().
>Since the hardware subbank is free, could the hardware allocation later
>assign old_midx to a completely new user while our mcam_rules list still
>points to it?

In this specific 'should-never-happen' scenario, attempting a graceful software rewind could mask a critical fault or lead to
further exceptions. My intention with this Smatch fix was to ensure smatch fix, is at least caught and logged rather than silently ignored.
I can address a full transactional rollback mechanism in a separate hardening series for net-next. But issue is gracefull rollback
can also fail as it is kind of another defrag process. or may be we can copy to new entries but not free the existing entries, that would be a
enhancement request for net-next.

>If the old rule's owner later modifies or deletes it, could this cause a
>resource collision and inadvertently destroy the newly allocated rule?
No this wont happen, mbox messages are processed by AF driver sequentially. More than that
mcam->lock mutex is already acquired before all these operations.