Re: [PATCH] drm/amdgpu: fix zero-size GDS range init on RDNA4

From: Christian König

Date: Tue Apr 21 2026 - 02:44:00 EST


On 4/20/26 23:57, arjan@xxxxxxxxxxxxxxx wrote:
>
> RDNA4 (GFX 12) hardware removes the GDS, GWS, and OA on-chip memory
> resources. The gfx_v12_0 initialisation code correctly leaves
> adev->gds.gds_size, adev->gds.gws_size, and adev->gds.oa_size at
> zero to reflect this.
>
> amdgpu_ttm_init() unconditionally calls amdgpu_ttm_init_on_chip() for
> each of these resources regardless of size. When the size is zero,
> amdgpu_ttm_init_on_chip() forwards the call to ttm_range_man_init(),
> which calls drm_mm_init(mm, 0, 0). drm_mm_init() immediately fires
> DRM_MM_BUG_ON(start + size <= start) -- trivially true when size is
> zero -- crashing the kernel during modprobe of amdgpu on an RX 9070 XT.

Mhm in general not a bad idea, but we are having tons of GFX 12 systems in our test machines and nothing is crashing there.

We are clearly missing something here. Is that on an upstream kernel or something backported?

Regards,
Christian.

>
> Guard against this by returning 0 early from
> amdgpu_ttm_init_on_chip() when size_in_page is zero. This skips TTM
> resource manager registration for hardware resources that are absent,
> without affecting any other GPU type.
>
> Link: https://lore.kernel.org/all/bug-221376-2300@xxxxxxxxxxxxxxxxxxxxxxxxx%2F/
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=221376
> Oops-Analysis: http://oops.fenrus.org/reports/bugzilla.korg/221376/report.html
> Assisted-by: GitHub Copilot:Claude Sonnet 4.6 linux-kernel-oops-x86.
> Signed-off-by: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx>
> Cc: Alex Deucher <alexander.deucher@xxxxxxx>
> Cc: "Christian König" <christian.koenig@xxxxxxx>
> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index afaaab6496def..8075ac735321e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -75,6 +75,9 @@ static int amdgpu_ttm_init_on_chip(struct amdgpu_device *adev,
> unsigned int type,
> uint64_t size_in_page)
> {
> + if (!size_in_page)
> + return 0;
> +
> return ttm_range_man_init(&adev->mman.bdev, type,
> false, size_in_page);
> }