Re: [PATCH] drm/amdgpu/soc24: reset dGPU if suspend got aborted
From: Alex Deucher
Date: Wed Jun 17 2026 - 10:40:43 EST
Applied. Thanks!
Alex
On Wed, Jun 17, 2026 at 3:54 AM Jakob Linke <jakob@xxxxxxxx> wrote:
>
> For SOC24 ASICs (RDNA4 / Navi 4x dGPUs) re-enabling PM features fails if an
> S3 suspend got aborted, the same issue already handled for SOC21 and SOC15:
>
> commit df3c7dc5c58b ("drm/amdgpu: Reset dGPU if suspend got aborted")
> commit 38e8ca3e4b6d ("amdgpu/soc15: enable asic reset for dGPU in case of suspend abort")
>
> The aborted resume fails with:
>
> amdgpu: SMU: No response msg_reg: 6 resp_reg: 0
> amdgpu: Failed to enable requested dpm features!
> amdgpu: resume of IP block <smu> failed -62
>
> Apply the same workaround for soc24: detect the aborted-suspend state at
> resume via the sign-of-life register and reset the device before re-init.
>
> This is a workaround till a proper solution is finalized.
>
> Fixes: 98b912c50e44 ("drm/amdgpu: Add soc24 common ip block (v2)")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Jakob Linke <jakob@xxxxxxxx>
> ---
> Tested on Navi 44 (RX 9060 XT): recovers the deep->s2idle fallback and pure
> s2idle resumes that otherwise fail with "resume of IP block <smu> failed -62".
> It did not recover every case: one resume still failed under sustained rapid
> s2idle cycling, so like the SOC21/SOC15 versions this is a mitigation, not a
> complete fix. Single suspends in normal use recover.
>
> drivers/gpu/drm/amd/amdgpu/soc24.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc24.c b/drivers/gpu/drm/amd/amdgpu/soc24.c
> index ecb6c3fcfbd1..a970d8a76302 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc24.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc24.c
> @@ -521,8 +521,36 @@ static int soc24_common_suspend(struct amdgpu_ip_block *ip_block)
> return soc24_common_hw_fini(ip_block);
> }
>
> +static bool soc24_need_reset_on_resume(struct amdgpu_device *adev)
> +{
> + u32 sol_reg1, sol_reg2;
> +
> + /* Will reset for the following suspend abort cases.
> + * 1) Only reset dGPU side.
> + * 2) S3 suspend got aborted and TOS is active.
> + * As for dGPU suspend abort cases the SOL value
> + * will be kept as zero at this resume point.
> + */
> + if (!(adev->flags & AMD_IS_APU) && adev->in_s3) {
> + sol_reg1 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81);
> + msleep(100);
> + sol_reg2 = RREG32_SOC15(MP0, 0, regMPASP_SMN_C2PMSG_81);
> +
> + return (sol_reg1 != sol_reg2);
> + }
> +
> + return false;
> +}
> +
> static int soc24_common_resume(struct amdgpu_ip_block *ip_block)
> {
> + struct amdgpu_device *adev = ip_block->adev;
> +
> + if (soc24_need_reset_on_resume(adev)) {
> + dev_info(adev->dev, "S3 suspend aborted, resetting...");
> + soc24_asic_reset(adev);
> + }
> +
> return soc24_common_hw_init(ip_block);
> }
>
> --
> 2.54.0
>