[PATCH v3 0/8] Improve GPU Recovery
From: Akhil P Oommen
Date: Sat Jul 30 2022 - 05:41:28 EST
Recently, I debugged a few device crashes which occured during recovery
after a hangcheck timeout. It looks like there are a few things we can
do to improve our chance at a successful gpu recovery.
First one is to ensure that CX GDSC collapses which clears the internal
states in gpu's CX domain. First 5 patches tries to handle this.
Rest of the patches are to ensure that few internal blocks like CP, GMU
and GBIF are halted properly before proceeding for a snapshot followed by
recovery. Also, handle 'prepare slumber' hfi failure correctly. These
are A6x specific improvements.
This series is rebased on top of [1] which based on linus's master
branch.
[1] https://patchwork.freedesktop.org/series/106860/
Changes in v3:
- Use reset interface from gpucc driver to poll for cx gdsc collapse
https://patchwork.freedesktop.org/series/106860/
- Use single pm refcount for all active submits
Changes in v2:
- Rebased on msm-next tip
Akhil P Oommen (8):
drm/msm: Remove unnecessary pm_runtime_get/put
drm/msm: Take single rpm refcount on behalf of all submits
drm/msm: Correct pm_runtime votes in recover worker
drm/msm: Fix cx collapse issue during recovery
drm/msm/a6xx: Ensure CX collapse during gpu recovery
drm/msm/adreno: Remove a WARN() during runtime_suspend
drm/msm/a6xx: Improve gpu recovery sequence
drm/msm/a6xx: Handle GMU prepare-slumber hfi failure
drivers/gpu/drm/msm/adreno/a6xx.xml.h | 4 ++
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 83 +++++++++++++++++++-----------
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 35 +++++++++++--
drivers/gpu/drm/msm/adreno/adreno_device.c | 7 ---
drivers/gpu/drm/msm/msm_gpu.c | 21 +++++---
drivers/gpu/drm/msm/msm_gpu.h | 4 ++
drivers/gpu/drm/msm/msm_ringbuffer.c | 4 --
7 files changed, 106 insertions(+), 52 deletions(-)
--
2.7.4