Re: [PATCH] mailbox: pcc: Fix probabilistic command execution timeout

From: Robbie King

Date: Mon Apr 27 2026 - 10:28:49 EST


On 4/21/2026 6:05 AM, Sudeep Holla wrote:
> On Fri, Apr 17, 2026 at 11:14:29AM +0800, Huisong Li wrote:
>> In some scenarios, PCC command may experience probabilistic timeout.
>> This is primarily caused by the chan_in_use flag being updated after
>> ringing the doorbell, coupled with a lack of proper memory barriers
>> across CPU cores.
>>
>> On fast platforms, a race condition occurs: the platform processing
>> completes and triggers an interrupt before the local CPU sets
>> chan_in_use to true. When the interrupt handler pcc_mbox_irq() runs,
>> it reads chan_in_use as false and incorrectly ignores the interrupt.
>>
>> This patch fixes the race by:
>> 1. Moving the chan_in_use update before ringing the doorbell.
>> 2. Using smp_store_release() to ensure the flag update is visible
>> to other cores before subsequent hardware or software actions
>> are triggered.
>> 3. Using smp_load_acquire() in the interrupt handler to ensure the
>> latest flag value is read before deciding to skip the interrupt.
>>
>
> Are you seeing the issue on real platforms or you are just reviewing the
> code. I would like to test it on the platform I use but I don't have it
> handy, so may take some time.
>
> I have added Robbie King who also has helped in testing this PCC driver
> in the past.
>

I was unable to apply the patch to our current kernel version. We are in the
process of updating our kernel modules to support 7.0, once that effort is
finished I can run a few of our current regressions against the patch.