Re: [PATCH] drm/msm/dp: do not complete dp_aux_cmd_fifo_tx() if irq is not for aux transfer

From: Doug Anderson
Date: Wed Dec 14 2022 - 17:29:44 EST


Hi,

On Wed, Dec 14, 2022 at 1:21 PM Kuogee Hsieh <quic_khsieh@xxxxxxxxxxx> wrote:
>
> There are 3 possible interrupt sources are handled by DP controller,
> HPDstatus, Controller state changes and Aux read/write transaction.
> At every irq, DP controller have to check isr status of every interrupt
> sources and service the interrupt if its isr status bits shows interrupts
> are pending. There is potential race condition may happen at current aux
> isr handler implementation since it is always complete dp_aux_cmd_fifo_tx()
> even irq is not for aux read or write transaction. This may cause aux read
> transaction return premature if host aux data read is in the middle of
> waiting for sink to complete transferring data to host while irq happen.
> This will cause host's receiving buffer contains unexpected data. This
> patch fixes this problem by checking aux isr and return immediately at
> aux isr handler if there are no any isr status bits set.
>
> Follows are the signature at kernel logs when problem happen,
> EDID has corrupt header
> panel-simple-dp-aux aux-aea0000.edp: Couldn't identify panel via EDID
> panel-simple-dp-aux aux-aea0000.edp: error -EIO: Couldn't detect panel nor find a fallback
>
> Signed-off-by: Kuogee Hsieh <quic_khsieh@xxxxxxxxxxx>
> ---
> drivers/gpu/drm/msm/dp/dp_aux.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/dp/dp_aux.c b/drivers/gpu/drm/msm/dp/dp_aux.c
> index d030a93..8f8b12a 100644
> --- a/drivers/gpu/drm/msm/dp/dp_aux.c
> +++ b/drivers/gpu/drm/msm/dp/dp_aux.c
> @@ -423,6 +423,13 @@ void dp_aux_isr(struct drm_dp_aux *dp_aux)
>
> isr = dp_catalog_aux_get_irq(aux->catalog);
>
> + /*
> + * if this irq is not for aux transfer,
> + * then return immediately
> + */

Why do you need 4 lines for a comment that fits on one line?

> + if (!isr)
> + return;

I can confirm that this works for me. I could reproduce the EDID
problems in the past and I can't after this patch. ...so I could give
a:

Tested-by: Douglas Anderson <dianders@xxxxxxxxxxxx>

I'm not an expert on this part of the code, so feel free to ignore my
other comments if everyone else thinks this patch is fine as-is, but
to me something here feels a little fragile. It feels a little weird
that we'll "complete" for _any_ interrupt that comes through now
rather than relying on dp_aux_native_handler() / dp_aux_i2c_handler()
to specifically identify interrupts that caused the end of the
transfer. I guess that idea is that every possible interrupt we get
causes the end of the transfer?

-Doug