Re: [PATCH] mailbox: pcc: Fix probabilistic command execution timeout

From: lihuisong (C)

Date: Tue May 26 2026 - 02:28:52 EST



On 4/27/2026 10:28 PM, Robbie King wrote:
On 4/21/2026 6:05 AM, Sudeep Holla wrote:
On Fri, Apr 17, 2026 at 11:14:29AM +0800, Huisong Li wrote:
In some scenarios, PCC command may experience probabilistic timeout.
This is primarily caused by the chan_in_use flag being updated after
ringing the doorbell, coupled with a lack of proper memory barriers
across CPU cores.

On fast platforms, a race condition occurs: the platform processing
completes and triggers an interrupt before the local CPU sets
chan_in_use to true. When the interrupt handler pcc_mbox_irq() runs,
it reads chan_in_use as false and incorrectly ignores the interrupt.

This patch fixes the race by:
1. Moving the chan_in_use update before ringing the doorbell.
2. Using smp_store_release() to ensure the flag update is visible
to other cores before subsequent hardware or software actions
are triggered.
3. Using smp_load_acquire() in the interrupt handler to ensure the
latest flag value is read before deciding to skip the interrupt.

Are you seeing the issue on real platforms or you are just reviewing the
code. I would like to test it on the platform I use but I don't have it
handy, so may take some time.

I have added Robbie King who also has helped in testing this PCC driver
in the past.

I was unable to apply the patch to our current kernel version. We are in the
process of updating our kernel modules to support 7.0, once that effort is
finished I can run a few of our current regressions against the patch.
Hi Robbie King,
Can you help test this patch now?
Or we can fix the conflict to test it on your current kernel.