Re: [PATCH v5 fwctl 4/5] fwctl/bnxt_fwctl: Add bnxt fwctl device

From: Jason Gunthorpe

Date: Fri Mar 13 2026 - 12:19:07 EST

On Thu, Feb 26, 2026 at 12:23:17AM -0800, Pavan Chebbi wrote:
> + rpc_in.msg_len = msg->req_len;
> + rpc_in.resp = kzalloc(*out_len, GFP_KERNEL);
> + if (!rpc_in.resp) {
> + err = -ENOMEM;
> + goto free_msg_out;
> + }
> +
> + rpc_in.resp_max_len = *out_len;
> + if (!msg->timeout)
> + rpc_in.timeout = FWCTL_BNXT_HWRM_DFLT_TIMEOUT;
> + else
> + rpc_in.timeout = msg->timeout;
> +
> + rc = bnxt_send_msg(bnxt_aux_priv->edev, &rpc_in);
> + if (rc) {
> + struct output *resp = rpc_in.resp;
> +
> + /* Copy the response to user always, as it contains
> + * detailed status of the command failure
> + */
> + if (!resp->error_code)
> + /* bnxt_send_msg() returned much before FW
> + * received the command.
> + */
> + resp->error_code = rc;
> + }
> +free_msg_out:
> + kfree(rpc_in.msg);

Timeout is such a complicated thing to add to a HW RPC interface. Does
bnxt do it right? Claude says no, and it looks compelling to me..

So don't give userspace an easy ability to trigger timeout, by
lowering the timeout value, and causing corruption in the kernel.

If you hit a FW timeout you probably have to FLR the device to recover
from it, if fwctl is going to have a timeout knob it has to leave the
request active in the kernel and orphan it to avoid a FLR - not sure
that makes much sense.

>> I want to investigate what the state is after this function fails,
>> particularly if it fails for a timeout. Is there any cahnce that
>> fw_msg->resp can be written too after the function returns because
>> it is still programmined into HW as a DMA?

DMA Safety Analysis: bnxt_send_msg() Timeout Path

Overview

Investigation of whether firmware can DMA into a freed buffer after
bnxt_send_msg() returns due to a timeout.

How resp_addr Gets Programmed Into HW

At bnxt_hwrm.c:90, the DMA address of the response buffer is embedded in
the request message itself:

ctx->req->resp_addr = cpu_to_le64(dma_handle + BNXT_HWRM_RESP_OFFSET);

This address is sent to the firmware as part of the HWRM command. The
firmware caches it and will DMA the response to that address when it
completes.

What Happens on Timeout

There are multiple timeout goto exit paths (lines 587, 645, 668 in
bnxt_hwrm.c). All jump to the exit label at line 685:

exit:
if (token)
__hwrm_release_token(bp, token);
if (ctx->flags & BNXT_HWRM_INTERNAL_CTX_OWNED)
ctx->flags |= BNXT_HWRM_INTERNAL_RESP_DIRTY;
else
__hwrm_ctx_drop(bp, ctx);

No abort command is sent to the firmware. No mechanism exists to tell
the HW to stop using resp_addr.

The Specific Flow in bnxt_send_msg()

Since hwrm_req_hold() is called, the request is “owned”, so on timeout:

1. __hwrm_send() sets BNXT_HWRM_INTERNAL_RESP_DIRTY flag and returns
error.
2. Back in bnxt_send_msg(), resp_len is read from resp->resp_len. On a
timeout this is likely 0 (buffer was zeroed at alloc), so the memcpy
is skipped.
3. hwrm_req_drop() calls __hwrm_ctx_drop() which calls dma_pool_free()
at line 313.

The DMA Hazard

After dma_pool_free(), the buffer goes back to the pool. But the
firmware still has resp_addr and may still DMA the response to it. This
means:

1. The pool buffer can be reallocated for a new HWRM request via
dma_pool_alloc().
2. The firmware can write to it at any time – corrupting the new
request’s data or response.

The BNXT_HWRM_INTERNAL_RESP_DIRTY flag only causes a memset on the next
use of the same buffer (around line 476), but that doesn’t help – it’s a
race. The firmware DMA can arrive after the memset, after the new
request is in flight, or even after the new response is written.

Regarding fw_msg->resp Specifically

In bnxt_send_msg() itself, fw_msg->resp is not a DMA target – it’s a
plain kernel buffer that gets memcpy’d into. So fw_msg->resp won’t be
directly corrupted by late DMA. The DMA target is the internal ctx->resp
buffer from the pool.

Summary

-----------------------------------------------------------------------
Aspect Finding
----------------------------------- -----------------------------------
fw_msg->resp written by late DMA? No – it’s a memcpy destination, not
a DMA target

Pool DMA buffer at risk? Yes – firmware retains resp_addr
after timeout

Firmware abort on timeout? No – no abort/stop mechanism exists

DMA unmap on timeout? No – dma_pool_free() returns buffer
to pool without unmapping

Synchronization barrier? No – no barriers or delays before
freeing

Corruption scenario Firmware completes late, DMA writes
into reallocated pool buffer used
by a new HWRM command
-----------------------------------------------------------------------