[PATCH 0/2] drm/lima: fix devfreq refcount imbalance for job timeouts

From: Erico Nunes
Date: Mon Apr 01 2024 - 17:20:41 EST


This is a followup to https://patchwork.freedesktop.org/series/128856/

Patch 1 rev1 from that series
https://patchwork.freedesktop.org/patch/574745/?series=128856&rev=1
was dropped because it needed a better solution for a race condition
between the irq and the timeout handler.
The proposed solution in that discussion is to solve the race condition
by masking the irqs during the timeout handler execution, which is what
is done here.
This bug is very hard to reproduce with regular applications, but I
found it to be reliable to reproduce with a program that triggers many
jobs right in the boundary between timeouting, so that jobs still manage
to complete while the timeout handler runs.

With this series, I was unable to further reproduce the bug.

At first I had only the pp and gp irqs masked and the problem never
reproduced again on Mali-400, but I still managed to reproduce it on
Mali-450 after hours of test time. After masking the pp bcast irq as
well I was not able to reproduce it anymore even on Mali-450, so I think
that was the missing bit for it.

Erico Nunes (2):
drm/lima: add mask irq callback to gp and pp
drm/lima: mask irqs in timeout path before hard reset

drivers/gpu/drm/lima/lima_bcast.c | 12 ++++++++++++
drivers/gpu/drm/lima/lima_bcast.h | 3 +++
drivers/gpu/drm/lima/lima_gp.c | 8 ++++++++
drivers/gpu/drm/lima/lima_pp.c | 18 ++++++++++++++++++
drivers/gpu/drm/lima/lima_sched.c | 9 +++++++++
drivers/gpu/drm/lima/lima_sched.h | 1 +
6 files changed, 51 insertions(+)

--
2.44.0