Re: [PATCH v16 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem

From: Ratheesh Kannoth

Date: Fri May 22 2026 - 07:55:58 EST

On 2026-05-21 at 15:23:01, Ratheesh Kannoth (rkannoth@xxxxxxxxxxx) wrote:
> Flashing updated firmware on deployed devices is cumbersome. Provide a
> mechanism to load a custom KPU (Key Parse Unit) profile directly from
> the filesystem at module load time.

>This commit introduces a mechanism to load custom KPU profiles from the
>filesystem at module load time. It adds a kpu_profile module parameter
>to read profiles from /lib/firmware/kpu.
>> diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
>> index 0e1744609ccfd..513e68711962f 100644
>> --- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
>> +++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
>> @@ -1676,8 +1692,15 @@ int npc_cn20k_apply_custom_kpu(struct rvu *rvu,
>> }
>>
>> for (kpu = 0; kpu < fw->kpus; kpu++) {
>> fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset);
>> if (fw_kpu->entries > KPU_CN20K_MAX_CST_ENT)
>This is a pre-existing issue, but missing a check for negative entries here
>allows bypassing bounds checks. If fw_kpu->entries is negative, multiplying it
>by sizeof(*cam) causes an unsigned wraparound. The offset decreases,
>potentially wrapping around the 64-bit address space to bypass bounds checks
>later, leading to arbitrary backward out-of-bounds reads. Could we add a check
>to ensure entries is non-negative?
>This is also a pre-existing issue, but does missing a bounds check before
>dereferencing the firmware structure fields lead to an out-of-bounds read?
>The bounds check for rvu->kpu_fwdata_sz < hdr_sz + offset happens later, but
>fw_kpu is located at fw->data + offset. If the firmware blob is truncated
>exactly at hdr_sz + offset, reading fw_kpu->entries accesses 4 bytes out of
>bounds. Can this be checked before accessing fw_kpu?

Pre existing issue. Will prepare patch for net tree once this series is merged.

>>
>> +static int npc_alloc_kpu_cam2_n_action2(struct rvu *rvu, int kpu_num,
>> + int num_entries)
>> +{
>> + struct npc_kpu_profile_adapter *adapter = &rvu->kpu;
>> + struct npc_kpu_profile *kpu;
>> +
>> + kpu = &adapter->kpu[kpu_num];
>> +
>> + kpu->cam2 = devm_kcalloc(rvu->dev, num_entries,
>> + sizeof(*kpu->cam2), GFP_KERNEL);
>Does storing device-managed memory in a shared static global array cause
>use-after-free and data races? The adapter->kpu pointer references
>npc_kpu_profiles, which is a static global array shared across all OcteonTX2
>devices.
>If multiple devices are probed, they will race to overwrite these pointers.
>Furthermore, if a device is unbound, the devm framework frees the memory,
>leaving dangling pointers in the global array which will trigger a
>use-after-free on subsequent access.
>

There is only one AF device per system.

>> + fw = rvu->kpu_fwdata;
>>
>> + for (kpu = 0; kpu < fw->kpus; kpu++) {
>> + if (rvu->kpu_fwdata_sz < hdr_sz + offset) {
>> + dev_warn(rvu->dev,
>> + "Profile size mismatch on KPU%i parsing\n",
>> + kpu + 1);
>> + return -EINVAL;
>> + }
>> +
>> + fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset);
>> + if (fw_kpu->entries < 0) {
>Is there a missing bounds check before dereferencing fw_kpu? The bounds check
>for rvu->kpu_fwdata_sz < hdr_sz + offset only verifies the start of the
>structure.
>Since fw_kpu is located at fw->data + offset, the structure itself resides
>after hdr_sz + offset. If the firmware blob is truncated to exactly
>hdr_sz + offset, reading fw_kpu->entries will access 4 bytes out of bounds.
>

This is an existing issue. I agree, even though we rearranged and renamed functions,
mentioned bug exist. Will prepare a net-next patch to harden these parts once this series
is merged.

> + profile->ikpu2 = devm_kcalloc(rvu->dev, 1,
>> + sizeof(ikpu_action_entries),
>> + GFP_KERNEL);
>> + if (!profile->ikpu2)
>> + return -ENOMEM;
>> +
>> + action = (struct npc_kpu_profile_action *)(fw->data + offset);
>> +
>> + if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))
>> + return -EINVAL;
>> +
>> + /* The firmware layout does dependent on the internal size of
>> + * ikpu_action_entries.
>> + */
>> + memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries));
>> + offset += sizeof(ikpu_action_entries);
>Does tying the firmware ABI to the kernel's internal array size break
>backward and forward compatibility?
>Since ikpu_action_entries is a statically sized kernel array, its size
>increases whenever new port kinds are added in future kernel releases. Older
>firmware binaries will either fail the bounds check or parse data from the
>wrong offset, breaking firmware compatibility.

NO. We keep the same size for all binaries (ikpu_action_entries). We cant introduce
size variable in this structure as it will break backward compatability.

>+ *fw_kpus = fw->kpus;
>> +
>> + sfw = devm_kcalloc(rvu->dev, 1, sizeof(*sfw), GFP_KERNEL);
>> + if (!sfw)
>> + return -ENOMEM;
>> +
>> + memcpy(sfw, fw, sizeof(*sfw));
>Will calling memcpy directly on an IOMEM pointer cause faults on ARM64?
>The KPU profile firmware can be loaded from the firmware database which maps
>the profile as device memory. Using standard memcpy emits unaligned loads or
>DC ZVA instructions that trigger hardware exceptions like alignment faults or
>external aborts on device memory on ARM64 architectures.
>Should memcpy_fromio be used here instead?
>
THis is not an IO memory. This fw loaded memory thru request_firmware_direct()