[PATCH] drm/amdgpu/soc24: reset dGPU if suspend got aborted

From: Jakob Linke

Date: Wed Jun 17 2026 - 02:32:37 EST


For SOC24 ASICs (RDNA4 / Navi 4x dGPUs) re-enabling PM features fails if an
S3 suspend got aborted, the same issue already handled for SOC21 and SOC15:

commit df3c7dc5c58b ("drm/amdgpu: Reset dGPU if suspend got aborted")
commit 38e8ca3e4b6d ("amdgpu/soc15: enable asic reset for dGPU in case of suspend abort")

The aborted resume fails with:

amdgpu: SMU: No response msg_reg: 6 resp_reg: 0
amdgpu: Failed to enable requested dpm features!
amdgpu: resume of IP block <smu> failed -62

Apply the same workaround for soc24: detect the aborted-suspend state at
resume via the sign-of-life register and reset the device before re-init.

This is a workaround till a proper solution is finalized.

Fixes: 98b912c50e44 ("drm/amdgpu: Add soc24 common ip block (v2)")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Jakob Linke <jakob@xxxxxxxx>
---
Tested on Navi 44 (RX 9060 XT): recovers the deep->s2idle fallback and pure
s2idle resumes that otherwise fail with "resume of IP block <smu> failed -62".
It did not recover every case: one resume still failed under sustained rapid
s2idle cycling, so like the SOC21/SOC15 versions this is a mitigation, not a
complete fix. Single suspends in normal use recover.

drivers/gpu/drm/amd/amdgpu/soc24.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc24.c b/drivers/gpu/drm/amd/amdgpu/soc24.c
index ecb6c3fcfbd1..a970d8a76302 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc24.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc24.c
@@ -521,8 +521,36 @@ static int soc24_common_suspend(struct amdgpu_ip_block *ip_block)
return soc24_common_hw_fini(ip_block);
}

+static bool soc24_need_reset_on_resume(struct amdgpu_device *adev)
+{
+ u32 sol_reg1, sol_reg2;
+
+ /* Will reset for the following suspend abort cases.
+ * 1) Only reset dGPU side.
+ * 2) S3 suspend got aborted and TOS is active.
+ * As for dGPU suspend abort cases the SOL value
+ * will be kept as zero at this resume point.
+ */
+ if (!(adev->flags & AMD_IS_APU) && adev->in_s3) {
+ sol_reg1 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81);
+ msleep(100);
+ sol_reg2 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81);
+
+ return (sol_reg1 != sol_reg2);
+ }
+
+ return false;
+}
+
static int soc24_common_resume(struct amdgpu_ip_block *ip_block)
{
+ struct amdgpu_device *adev = ip_block->adev;
+
+ if (soc24_need_reset_on_resume(adev)) {
+ dev_info(adev->dev, "S3 suspend aborted, resetting...");
+ soc24_asic_reset(adev);
+ }
+
return soc24_common_hw_init(ip_block);
}

--
2.54.0