Re: [PATCH] mailbox: pcc: Fix probabilistic command execution timeout

From: lihuisong (C)

Date: Tue Apr 21 2026 - 07:23:53 EST



On 4/21/2026 6:05 PM, Sudeep Holla wrote:
On Fri, Apr 17, 2026 at 11:14:29AM +0800, Huisong Li wrote:
In some scenarios, PCC command may experience probabilistic timeout.
This is primarily caused by the chan_in_use flag being updated after
ringing the doorbell, coupled with a lack of proper memory barriers
across CPU cores.

On fast platforms, a race condition occurs: the platform processing
completes and triggers an interrupt before the local CPU sets
chan_in_use to true. When the interrupt handler pcc_mbox_irq() runs,
it reads chan_in_use as false and incorrectly ignores the interrupt.

This patch fixes the race by:
1. Moving the chan_in_use update before ringing the doorbell.
2. Using smp_store_release() to ensure the flag update is visible
to other cores before subsequent hardware or software actions
are triggered.
3. Using smp_load_acquire() in the interrupt handler to ensure the
latest flag value is read before deciding to skip the interrupt.

Are you seeing the issue on real platforms or you are just reviewing the
code. I would like to test it on the platform I use but I don't have it
handy, so may take some time.
Yeah, this is a real issue on my platform. It is probabilistic.
And the problem is gone after this modification.

Gemini AI help me to analyze and suggest that I fix it as this patch done.


I have added Robbie King who also has helped in testing this PCC driver
in the past.
Thanks. Please help review this patch.