Re: [PATCH v14 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control

From: Paolo Abeni

Date: Tue May 19 2026 - 05:17:24 EST

On 5/15/26 9:26 AM, Ratheesh Kannoth wrote:
> On 2026-05-14 at 11:55:33, Ratheesh Kannoth (rkannoth@xxxxxxxxxxx) wrote:
>>> + restrict_valid = false;
>>> +
>>> + for (prio = 0; prio < cnt; prio++) {
>>> + sb_idx = narr[prio];
>>> + sb = &npc_priv.sb[sb_idx];
>>> + sb->arr_idx = prio;
>>> +
>>> + /* erase entry if exist */
>>> + xa_erase(&npc_priv.xa_sb_free, sb->arr_idx);
>>> + xa_erase(&npc_priv.xa_sb_used, sb->arr_idx);
>>> +
>>> + xa = &npc_priv.xa_sb_free;
>>> + if (sb->flags & NPC_SUBBANK_FLAG_USED)
>>> + xa = &npc_priv.xa_sb_used;
>>> +
>>> + /* if xa_store() fails due to memory pressure, there is
>>> + * no point in rolling back, as roll back may also fail
>>> + */
>>> + rc = xa_err(xa_store(xa, sb->arr_idx,
>>> + xa_mk_value(sb_idx), GFP_KERNEL));
>>> + if (rc) {
>>> + dev_err(rvu->dev,
>>> + "Setting arr_idx=%d for sb=%d failed\n",
>>> + sb->arr_idx, sb_idx);
>>> + goto fail;
>>> + }
>>> + }
>> Does this error path leak subbank resources and leave the xarrays in a
>> corrupted state?
>> If xa_store() fails, the function jumps to the fail label and aborts
>> without restoring the original state. Because the entry at the current
>> priority index was already erased via xa_erase(), but the new subbank
>> failed to store, that subbank is permanently lost.
>> Additionally, the subbank that failed to store is left with a modified
>> arr_idx pointing to an empty slot, potentially breaking future allocations.
>> Could this function preallocate resources (such as using xa_reserve()) or
>> build temporary xarrays to ensure the changes can be applied atomically
>> without a destructive intermediate state?
>
> The omission of a transactional rollback path is intentional due to
> the systemic implications of a GFP_KERNEL allocation failure inside
> the xarray core. If xa_store() returns an ENOMEM error, the underlying
> slab allocator is already exhausted, meaning any programmatic attempt
> to erase or restore original indices would also fail, compounding the
> unpredictable state.
>
> Because this devlink callback operates under the top-level mcam->lock
> mutex, no concurrent allocation or translation worker can read the
> partially modified xarrays while the driver is in this degraded state.
> An xa_store() failure indicates a fatal, system-wide memory leak or
> exhaustion event. At this threshold, the driver prints a critical
> error payload to dmesg to allow deterministic post-mortem debugging of
> the host's global memory subsystem, rather than attempting an unreliable
> in-kernel rollback.

I'm sorry, but I don't agree with your statement above: allocation may
fail under memory pressure, and memory pressure can be transient; it
should not brick the driver requiring a reload.

Additionally I think it should be possible to handle gracefully the
failure inverting the order of xa_store() and xa_erase() and performing
the latter only on the relevant xarray (i.e. xa_sb_used when store
operates on xa_sb_free, and vice versa).

/P