RE: [PATCH REGRESSION] Revert "drm/amdgpu: stop scheduler when calling hw_fini (v2)"

From: Deucher, Alexander
Date: Mon Jan 10 2022 - 11:08:27 EST


[Public]

> -----Original Message-----
> From: Len Brown <lenb417@xxxxxxxxx> On Behalf Of Len Brown
> Sent: Sunday, January 9, 2022 1:12 PM
> To: torvalds@xxxxxxxxxxxxxxxxxxxx
> Cc: linux-pm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Len Brown
> <len.brown@xxxxxxxxx>; Chen, Guchun <Guchun.Chen@xxxxxxx>;
> Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>; Koenig, Christian
> <Christian.Koenig@xxxxxxx>; Deucher, Alexander
> <Alexander.Deucher@xxxxxxx>; stable@xxxxxxxxxxxxxxx
> Subject: [PATCH REGRESSION] Revert "drm/amdgpu: stop scheduler when
> calling hw_fini (v2)"
>
> From: Len Brown <len.brown@xxxxxxxxx>
>
> This reverts commit f7d6779df642720e22bffd449e683bb8690bd3bf.
>
> This bisected regression has impacted suspend-resume stability since 5.15-
> rc1. It regressed -stable via 5.14.10.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D215315&amp;data=04%7C01%7Cal
> exander.deucher%40amd.com%7Ccf790be4827f4df9f2d808d9d39b81af%7C3
> dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637773487569442716%7C
> Unknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB
> TiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=AX0TXkyoMhy%2BZqE
> VgRSWMkKd5nPa4WOv%2B1FZHLSErSw%3D&amp;reserved=0
>
> Fixes: f7d6779df64 ("drm/amdgpu: stop scheduler when calling hw_fini (v2)")
> Cc: Guchun Chen <guchun.chen@xxxxxxx>
> Cc: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
> Cc: Christian Koenig <christian.koenig@xxxxxxx>
> Cc: Alex Deucher <alexander.deucher@xxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx> # 5.14+
> Signed-off-by: Len Brown <len.brown@xxxxxxxxx>

@Chen, Guchun, @Grodzovsky, Andrey, @Koenig, Christian

Any ideas? What's the consequence of reverting this patch? Didn't this patch fix another suspend/resume issue?

Alex

> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 --------
> 1 file changed, 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 9afd11ca2709..45977a72b5dd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -547,9 +547,6 @@ void amdgpu_fence_driver_hw_fini(struct
> amdgpu_device *adev)
> if (!ring || !ring->fence_drv.initialized)
> continue;
>
> - if (!ring->no_scheduler)
> - drm_sched_stop(&ring->sched, NULL);
> -
> /* You can't wait for HW to signal if it's gone */
> if (!drm_dev_is_unplugged(adev_to_drm(adev)))
> r = amdgpu_fence_wait_empty(ring);
> @@ -609,11 +606,6 @@ void amdgpu_fence_driver_hw_init(struct
> amdgpu_device *adev)
> if (!ring || !ring->fence_drv.initialized)
> continue;
>
> - if (!ring->no_scheduler) {
> - drm_sched_resubmit_jobs(&ring->sched);
> - drm_sched_start(&ring->sched, true);
> - }
> -
> /* enable the interrupt */
> if (ring->fence_drv.irq_src)
> amdgpu_irq_get(adev, ring->fence_drv.irq_src,
> --
> 2.25.1