Re: [net-next v38] mctp pcc: Implement MCTP over PCC Transport

From: Adam Young

Date: Wed Apr 22 2026 - 13:19:48 EST

On 4/9/26 23:24, Jakub Kicinski wrote:

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
mctp pcc: Implement MCTP over PCC Transport

This commit implements a network driver for the Management Component
Transport Protocol (MCTP) over the Platform Communication Channel (PCC).

Link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/14_Platform_Communications_Channel/Platform_Comm_Channel.html

diff --git a/drivers/net/mctp/mctp-pcc.c b/drivers/net/mctp/mctp-pcc.c
new file mode 100644
--- /dev/null
+++ b/drivers/net/mctp/mctp-pcc.c

[ ... ]

+static void mctp_pcc_client_rx_callback(struct mbox_client *cl, void *mssg)
+{

[ ... ]

+ // The message must at least have the PCC command indicating it is an MCTP
+ // message followed by the MCTP header, or we have a malformed message.
+ if (pcc_header.length < sizeof(pcc_header.command) + sizeof(struct mctp_hdr)) {
+ dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
+ return;
+ }
+
+ size = pcc_header.length - sizeof(pcc_header.command) + sizeof(pcc_header);

Can this size calculation wrap around? Because pcc_header.length is read from
shared memory, a maliciously large value (e.g. 0xFFFFFFF8) could wrap around
the unsigned 32-bit addition, resulting in a small positive size.

This small positive size would then bypass the check below, and eventually
fail in skb_pull(), leaving the malformed skb to be passed to the network
stack.

This seems legit. The logic to check the size should ensure that the value pulled out of the shared buffer is the size of the buffer, minus the header.

+ // If the reported size is larger than the shared memory, something is wrong
+ // and the best we can do is treat it as corrupted data.
+ if (size > inbox->chan->shmem_size) {
+ dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
+ return;
+ }

[ ... ]

+static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
+{

[ ... ]

+ pcc_header = skb_push(skb, sizeof(*pcc_header));
+ pcc_header->signature = PCC_SIGNATURE | mpnd->outbox.index;
+ pcc_header->flags = PCC_CMD_COMPLETION_NOTIFY;
+ memcpy(&pcc_header->command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH);
+ pcc_header->length = len + MCTP_SIGNATURE_LENGTH;

Are these assignments writing native endian values into shared memory?

The ACPI Platform Communications Channel specification defines the shared
memory region as strictly little-endian. Writing without cpu_to_le32() will
cause big endian architectures to write byte-swapped metadata, resulting in
dropped packets.

This looks legit, and it reflects some of the early version of my patch where I was doing similar transforms.

The target for this code is only ARM little endian systems, but I think it is more correct to handle the wider cases in case that decision changes.

+
+ rc = mbox_send_message(mpnd->outbox.chan->mchan, skb);
+ if (rc < 0) {

[ ... ]

+static void mctp_pcc_tx_prepare(struct mbox_client *cl, void *mssg)
+{

[ ... ]

+ if (skb->len > outbox->chan->shmem_size) {
+ dev_dstats_tx_dropped(mctp_pcc_ndev->ndev);
+ return;
+ }

If the skb is oversized, this skips the copy and returns, but does it abort
the transmission?

Since mctp_pcc_tx() already submitted the message and tx_prepare() returns
void, the hardware doorbell will still be rung. The remote endpoint would
process the unmodified shared memory, replaying the previous packet's data.

This looks legit. Once we are in tx_prepare the message is going to be sent. All checks like this need to be done in tx

+ memcpy_toio(outbox->chan->shmem, skb->data, skb->len);

Is it possible to read out of bounds here if the skb is fragmented?

The skb->data pointer only points to the linear portion of the packet, while
skb->len represents the total packet length including page fragments.
skb_cow_head() does not linearize the packet, so a call to skb_linearize()
might be needed before copying.

Legit.

While we don't have fragmentation enabled right now, it could very well be enabled in the future, and this should be addressed.

+}
+
+static void mctp_pcc_tx_done(struct mbox_client *c, void *mssg, int r)
+{
+ struct mctp_pcc_ndev *mctp_pcc_ndev;
+ struct sk_buff *skb = mssg;
+
+ mctp_pcc_ndev = container_of(c, struct mctp_pcc_ndev, outbox.client);
+ dev_dstats_tx_add(mctp_pcc_ndev->ndev, skb->len);

Could this lead to sequence counter corruption and deadlock?

mctp_pcc_tx() runs in process or softirq context and calls
dev_dstats_tx_dropped(). mctp_pcc_tx_done() runs in the controller's hardirq
context and calls dev_dstats_tx_add().

The generic dev_dstats_* macros use u64_stats_update_begin(), which does not
disable interrupts. If a hardware interrupt fires while the softirq path is
inside u64_stats_update_begin(), the hardirq path will re-enter the lock.

I think this is legit. mctp_pcc_tx_done is triggered by the interupt handler (it is the registered call back). While I do prefer to register the packet as being sent at this point, I think that latest it can be done is after the packet is written to the shared region in mctp_pcc_tx_prepare.

+ dev_consume_skb_any(skb);
+ netif_wake_queue(mctp_pcc_ndev->ndev);
+}

[ ... ]

+static int mctp_pcc_ndo_stop(struct net_device *ndev)
+{
+ struct mctp_pcc_ndev *mctp_pcc_ndev;
+ unsigned int count, idx;
+ struct mbox_chan *chan;
+ struct sk_buff *skb;
+
+ mctp_pcc_ndev = netdev_priv(ndev);
+ chan = mctp_pcc_ndev->outbox.chan->mchan;
+
+ scoped_guard(spinlock_irqsave, &chan->lock) {
+ chan->active_req = NULL;

Does this bypass the mailbox API and cause a memory leak?

By explicitly setting active_req to NULL without freeing the in-flight skb,
the skb is leaked.

Additionally, when the hardware completion interrupt fires, mbox_chan_txdone()
will pass NULL as the mssg parameter to mctp_pcc_tx_done(). Since
mctp_pcc_tx_done() dereferences skb->len, this will result in a NULL pointer
dereference.

Legit. I was thinking that the packet pointed to by chan->active_req was still in the ring buffer, but it has already been removed.

+ while (chan->msg_count > 0) {
+ count = chan->msg_count;
+ idx = chan->msg_free;
+ if (idx >= count)
+ idx -= count;
+ else
+ idx += MBOX_TX_QUEUE_LEN - count;

Is this index calculation accurate?

Since msg_free is incremented and msg_count is decremented in each iteration
below, the calculated idx jumps by 2. This skips alternate skbs, leaking
them, and reads uninitialized memory, which is then passed to
dev_consume_skb_any(), causing a use-after-free or double free.

I don't think this is legit. I've written code to test this algorithm, and it works as written. This logic comes from the drivers/mailbox/mailbox.c function msg_submit.

idx does not jump by two each iteration.

+ skb = chan->msg_data[idx];
+ dev_dstats_tx_dropped(ndev);
+ dev_consume_skb_any(skb);
+ chan->msg_count--;
+ chan->msg_free++;
+ }
+ }

[ ... ]

+static int mctp_pcc_driver_add(struct acpi_device *acpi_dev)
+{
+ struct mctp_pcc_lookup_context context = {0};
+ struct mctp_pcc_ndev *mctp_pcc_ndev;

[ ... ]

+ status = acpi_walk_resources(dev_handle, "_CRS", lookup_pcct_indices,
+ &context);
+ if (!ACPI_SUCCESS(status)) {
+ dev_err(dev, "FAILED to lookup PCC indexes from CRS\n");
+ return -EINVAL;
+ }

What happens if the _CRS table contains fewer than two PCC_DWORD_TYPE
resources?

If context.index is not verified to be at least 2 here, context.inbox_index
and context.outbox_index might remain uninitialized or at their default zero
values. This could cause the driver to blindly request PCC channel 0, which
may belong to another hardware subsystem.

If that is the case then things are truely corrupted on the system. So, yeah, this is legit, but a sign of things much worse happening.

It is easy enough to check that we get 2 values in the context from walk resources.

+
+ snprintf(name, sizeof(name), "mctppcc%d", context.inbox_index);
+ ndev = alloc_netdev(sizeof(*mctp_pcc_ndev), name, NET_NAME_PREDICTABLE,
+ mctp_pcc_setup);