Re: [PATCH 02/14] mailbox: pcc: Always clear the platform ack interrupt first
From: Sudeep Holla
Date: Wed Mar 05 2025 - 09:29:58 EST
On Wed, Mar 05, 2025 at 11:45:35AM +0800, lihuisong (C) wrote:
>
> 在 2025/3/3 18:51, Sudeep Holla 写道:
> > The PCC mailbox interrupt handler (pcc_mbox_irq()) currently checks
> > for command completion flags and any error status before clearing the
> > interrupt.
> >
> > The below sequence highlights an issue in the handling of PCC mailbox
> > interrupts, specifically when dealing with doorbell notifications and
> > acknowledgment between the OSPM and the platform where type3 and type4
> > channels are sharing the interrupt.
> >
> > Platform Firmware OSPM/Linux PCC driver
> > ------------------------------------------------------------------------
> > build message in shmem
> > ring type3 channel doorbell
> > receives the doorbell interrupt
> > process the message from OSPM
> > build response for the message
> > ring the platform ack interrupt to OSPM
> > --->
> > build notification in type4 channel
> > start processing in pcc_mbox_irq()
> > enter pcc handler for type4 chan
> > command complete cleared
> > read the notification
> > <--- clear platform ack irq
> > * no effect from above as platform ack irq *
> > * not yet triggered on this channel *
> > ring the platform ack irq on type4 channel
> > --->
> > leave pcc handler for type4 chan
> > enter pcc handler for type3 chan
> > command complete set
> > read the response
> > <--- clear platform ack irq
> > leave pcc handler for type3 chan
> > leave pcc_mbox_irq() handler
> > start processing in pcc_mbox_irq()
> > enter pcc handler for type4 chan
> > leave pcc handler for type4 chan
> > enter pcc handler for type3 chan
> > leave pcc handler for type3 chan
> > leave pcc_mbox_irq() handler
> This is not easy to understand to me.
> The issue as below described is already very clear to me.
> So suggest remove above flow graph.
I understood it with the graph similar to the one above, though I simplified
it in terms of PCC rather than specific IP reference.
> > The key issue occurs when OSPM tries to acknowledge platform ack
> > interrupt for a notification which is ready to be read and processed
> > but the interrupt itself is not yet triggered by the platform.
> >
> > This ineffective acknowledgment leads to an issue later in time where
> > the interrupt remains pending as we exit the interrupt handler without
> > clearing the platform ack interrupt as there is no pending response or
> > notification. The interrupt acknowledgment order is incorrect.
> >
> Has this issue been confired? It's more better if has the log.😁
> But it seems a valid issue.
Yes Robbie reported this. He is away and can't test or respond until next
week. The log just says there was loads of spurious interrupts and nobody
cared log as you got in the first patch of yours fixing similar race.
> >
> > To resolve this issue, the platform acknowledgment interrupt should
> > always be cleared before processing the interrupt for any notifications
> > or response.
> >
> AFAIC,always clearing the platform ack interrupt first which is also the
> communication flow as ACPI spec described.
Indeed, not sure how we missed it so far.
> I am not sure if it is ok when triggering interrupt and clearing interrupt
> occur concurrently.
Should be OK as we start clearing all the channels that share, if the
handler doesn't clear any source, the interrupt must remain asserted.
> But this scenario is always possible. I think It doesn't matter with this
> patch. It's just my confusion.
Indeed, it can happen any time as you mentioned. No worries better to ask
and clarify than assume. Thanks for your time and review.
--
Regards,
Sudeep