Re: [net-next v38] mctp pcc: Implement MCTP over PCC Transport

From: Jakub Kicinski

Date: Thu Apr 09 2026 - 23:25:41 EST

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
mctp pcc: Implement MCTP over PCC Transport

This commit implements a network driver for the Management Component
Transport Protocol (MCTP) over the Platform Communication Channel (PCC).

Link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/14_Platform_Communications_Channel/Platform_Comm_Channel.html

> diff --git a/drivers/net/mctp/mctp-pcc.c b/drivers/net/mctp/mctp-pcc.c
> new file mode 100644
> --- /dev/null
> +++ b/drivers/net/mctp/mctp-pcc.c

[ ... ]

> +static void mctp_pcc_client_rx_callback(struct mbox_client *cl, void *mssg)
> +{

[ ... ]

> + // The message must at least have the PCC command indicating it is an MCTP
> + // message followed by the MCTP header, or we have a malformed message.
> + if (pcc_header.length < sizeof(pcc_header.command) + sizeof(struct mctp_hdr)) {
> + dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
> + return;
> + }
> +
> + size = pcc_header.length - sizeof(pcc_header.command) + sizeof(pcc_header);

Can this size calculation wrap around? Because pcc_header.length is read from
shared memory, a maliciously large value (e.g. 0xFFFFFFF8) could wrap around
the unsigned 32-bit addition, resulting in a small positive size.

This small positive size would then bypass the check below, and eventually
fail in skb_pull(), leaving the malformed skb to be passed to the network
stack.

> + // If the reported size is larger than the shared memory, something is wrong
> + // and the best we can do is treat it as corrupted data.
> + if (size > inbox->chan->shmem_size) {
> + dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
> + return;
> + }

[ ... ]

> +static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
> +{

[ ... ]

> + pcc_header = skb_push(skb, sizeof(*pcc_header));
> + pcc_header->signature = PCC_SIGNATURE | mpnd->outbox.index;
> + pcc_header->flags = PCC_CMD_COMPLETION_NOTIFY;
> + memcpy(&pcc_header->command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH);
> + pcc_header->length = len + MCTP_SIGNATURE_LENGTH;

Are these assignments writing native endian values into shared memory?

The ACPI Platform Communications Channel specification defines the shared
memory region as strictly little-endian. Writing without cpu_to_le32() will
cause big endian architectures to write byte-swapped metadata, resulting in
dropped packets.

> +
> + rc = mbox_send_message(mpnd->outbox.chan->mchan, skb);
> + if (rc < 0) {

[ ... ]

> +static void mctp_pcc_tx_prepare(struct mbox_client *cl, void *mssg)
> +{

[ ... ]

> + if (skb->len > outbox->chan->shmem_size) {
> + dev_dstats_tx_dropped(mctp_pcc_ndev->ndev);
> + return;
> + }

If the skb is oversized, this skips the copy and returns, but does it abort
the transmission?

Since mctp_pcc_tx() already submitted the message and tx_prepare() returns
void, the hardware doorbell will still be rung. The remote endpoint would
process the unmodified shared memory, replaying the previous packet's data.

> + memcpy_toio(outbox->chan->shmem, skb->data, skb->len);

Is it possible to read out of bounds here if the skb is fragmented?

The skb->data pointer only points to the linear portion of the packet, while
skb->len represents the total packet length including page fragments.
skb_cow_head() does not linearize the packet, so a call to skb_linearize()
might be needed before copying.

> +}
> +
> +static void mctp_pcc_tx_done(struct mbox_client *c, void *mssg, int r)
> +{
> + struct mctp_pcc_ndev *mctp_pcc_ndev;
> + struct sk_buff *skb = mssg;
> +
> + mctp_pcc_ndev = container_of(c, struct mctp_pcc_ndev, outbox.client);
> + dev_dstats_tx_add(mctp_pcc_ndev->ndev, skb->len);

Could this lead to sequence counter corruption and deadlock?

mctp_pcc_tx() runs in process or softirq context and calls
dev_dstats_tx_dropped(). mctp_pcc_tx_done() runs in the controller's hardirq
context and calls dev_dstats_tx_add().

The generic dev_dstats_* macros use u64_stats_update_begin(), which does not
disable interrupts. If a hardware interrupt fires while the softirq path is
inside u64_stats_update_begin(), the hardirq path will re-enter the lock.

> + dev_consume_skb_any(skb);
> + netif_wake_queue(mctp_pcc_ndev->ndev);
> +}

[ ... ]

> +static int mctp_pcc_ndo_stop(struct net_device *ndev)
> +{
> + struct mctp_pcc_ndev *mctp_pcc_ndev;
> + unsigned int count, idx;
> + struct mbox_chan *chan;
> + struct sk_buff *skb;
> +
> + mctp_pcc_ndev = netdev_priv(ndev);
> + chan = mctp_pcc_ndev->outbox.chan->mchan;
> +
> + scoped_guard(spinlock_irqsave, &chan->lock) {
> + chan->active_req = NULL;

Does this bypass the mailbox API and cause a memory leak?

By explicitly setting active_req to NULL without freeing the in-flight skb,
the skb is leaked.

Additionally, when the hardware completion interrupt fires, mbox_chan_txdone()
will pass NULL as the mssg parameter to mctp_pcc_tx_done(). Since
mctp_pcc_tx_done() dereferences skb->len, this will result in a NULL pointer
dereference.

> + while (chan->msg_count > 0) {
> + count = chan->msg_count;
> + idx = chan->msg_free;
> + if (idx >= count)
> + idx -= count;
> + else
> + idx += MBOX_TX_QUEUE_LEN - count;

Is this index calculation accurate?

Since msg_free is incremented and msg_count is decremented in each iteration
below, the calculated idx jumps by 2. This skips alternate skbs, leaking
them, and reads uninitialized memory, which is then passed to
dev_consume_skb_any(), causing a use-after-free or double free.

> + skb = chan->msg_data[idx];
> + dev_dstats_tx_dropped(ndev);
> + dev_consume_skb_any(skb);
> + chan->msg_count--;
> + chan->msg_free++;
> + }
> + }

[ ... ]

> +static int mctp_pcc_driver_add(struct acpi_device *acpi_dev)
> +{
> + struct mctp_pcc_lookup_context context = {0};
> + struct mctp_pcc_ndev *mctp_pcc_ndev;

[ ... ]

> + status = acpi_walk_resources(dev_handle, "_CRS", lookup_pcct_indices,
> + &context);
> + if (!ACPI_SUCCESS(status)) {
> + dev_err(dev, "FAILED to lookup PCC indexes from CRS\n");
> + return -EINVAL;
> + }

What happens if the _CRS table contains fewer than two PCC_DWORD_TYPE
resources?

If context.index is not verified to be at least 2 here, context.inbox_index
and context.outbox_index might remain uninitialized or at their default zero
values. This could cause the driver to blindly request PCC channel 0, which
may belong to another hardware subsystem.

> +
> + snprintf(name, sizeof(name), "mctppcc%d", context.inbox_index);
> + ndev = alloc_netdev(sizeof(*mctp_pcc_ndev), name, NET_NAME_PREDICTABLE,
> + mctp_pcc_setup);
--
pw-bot: cr