Re: [PATCH v20 net-next 2/9] octeontx2-af: npc: cn20k: debugfs enhancements
From: Ratheesh Kannoth
Date: Wed Jun 10 2026 - 00:20:35 EST
On 2026-06-09 at 09:34:46, Ratheesh Kannoth (rkannoth@xxxxxxxxxxx) wrote:
> Improve MCAM visibility and field debugging for CN20K NPC.
>
> - Extend "mcam_layout" to show enabled (+) or disabled state per entry
> so status can be verified without parsing the full "mcam_entry" dump.
> - Add "dstats" debugfs entry: for enabled MCAM indices, print hit deltas
> since the prior read by comparing hardware counters to a per-entry
> software baseline and advancing that baseline after each read (hardware
> counters are not cleared).
> - Add "mismatch" debugfs entry: lists MCAM entries that are enabled
> but not explicitly allocated, helping diagnose allocation/field issues.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@xxxxxxxxxxx>
https://sashiko.dev/#/patchset/20260609040453.711932-1-rkannoth%40marvell.com says
>> +static DEFINE_MUTEX(stats_lock);
>> +
>> +/* MAX_NUM_BANKS, MAX_SUBBANK_DEPTH and MAX_NUM_SUB_BANKS represent
>> + * hard limit on all silicon variants, preventing any possibility of
>> + * out-of-bounds access.
>> + */
>> +static u64 dstats[MAX_NUM_BANKS][MAX_SUBBANK_DEPTH * MAX_NUM_SUB_BANKS] = {};
>> +static int npc_mcam_dstats_show(struct seq_file *s, void *unused)
>> +{
>[ ... ]
>> + stats = rvu_read64(rvu, blkaddr,
>> + NPC_AF_CN20K_MCAMEX_BANKX_STAT_EXT(idx, bank));
>> + if (!stats)
>> + continue;
>> + if (stats == dstats[bank][idx])
>> + continue;
>> +
>> + if (stats < dstats[bank][idx])
>> + dstats[bank][idx] = 0;
>Can stale dstats baselines persist across MCAM entry reallocation and
>silently corrupt hit telemetry?
>The dstats array maintains a per-entry software baseline. When an entry is
>freed, its hardware hit counter is reset to 0 in npc_clear_x2_entry(), but
>its corresponding dstats baseline is never cleared.
>If the entry is later reallocated and its hardware counter accumulates more
>hits than the old stale baseline before the next debugfs read, this fallback
>logic evaluates to false. The calculation will then incorrectly subtract
>the stale baseline, massively under-reporting the actual hits.
>Should the baseline be explicitly cleared synchronously when the entry
>is freed?
Thanks for pointing this out. The dstats matrix is intended purely
as a lightweight telemetry tool to identify which MCAM entries are
actively being hit by current traffic via debugfs, rather than
providing strict, transactionally guaranteed billing or accounting
metrics.
You are correct that a stale baseline could cause a transient
under-reporting on the very first read after a reallocation if the
hardware counter quickly surpasses the old baseline. However,
during that first read, the fallback logic will immediately catch
up and synchronize dstats[bank][idx] to the current hardware state
(stats). All subsequent reads will reflect completely accurate hit
increments.
Given that this is non-critical debugfs telemetry, i would like to
keep this as it is. or fix it as part of "net" tree.
>> +
>> + pf = 0xFFFF;
>> + map = xa_load(&npc_priv->xa_idx2pf_map, mcam_idx);
>> + if (map)
>> + pf = xa_to_value(map);
>> +
>> + delta = stats - dstats[bank][idx];
>> +
>> + snprintf(buff, sizeof(buff), "%u\t%#04x\t%llu\n",
>> + mcam_idx, pf, delta);
>> + seq_puts(s, buff);
>> +
>> + dstats[bank][idx] = stats;
>Does this global baseline destructively corrupt telemetry under concurrent
>reads?
>When any process reads the debugfs file, it updates dstats[bank][idx],
>destructively advancing the baseline for all other readers. If a monitoring
>daemon reads the file periodically and a user manually reads it, the manual
>read consumes the delta and resets the baseline, causing the monitoring
>daemon to miss all hits that occurred in between.
>Should telemetry exposed to userspace either provide raw cumulative counters
>or track baselines per-file-descriptor context?
Same comment as above.
>> + mutex_unlock(&stats_lock);
>> + return 0;
>> +}
>> +
>> +/* "%u\t%#04x\t%llu\n" needs less than 64 characters to print */
>> +#define TOTAL_SZ (MAX_NUM_BANKS * MAX_NUM_SUB_BANKS * MAX_SUBBANK_DEPTH * 64)
>> +DEFINE_OCTEONTX2_DEBUGFS_ATTRIBUTE_WITH_SIZE(npc_mcam_dstats, TOTAL_SZ);
>Does this bypass the chunking design of seq_file and cause unconditionally
>large allocations?
>By calling single_open_size() with a hardcoded TOTAL_SZ of roughly 1MB
>(16,384 entries * 64 bytes), this unconditionally allocates 1MB of kernel
>memory via kvmalloc every time the file is opened.
>Instead of dumping everything in a single show() call, would it be better
>to implement the seq_file iterator callbacks (start, next, show, stop) to
>process the MCAM entries in page-sized chunks?
I dont see any issue with current approach and would like to keep it.