Re: [PATCH] drm/amdgpu: fix zero-size GDS range init on RDNA4
From: Arjan van de Ven
Date: Tue Apr 21 2026 - 08:00:55 EST
On 4/20/2026 11:42 PM, Christian König wrote:
On 4/20/26 23:57, arjan@xxxxxxxxxxxxxxx wrote:
RDNA4 (GFX 12) hardware removes the GDS, GWS, and OA on-chip memory
resources. The gfx_v12_0 initialisation code correctly leaves
adev->gds.gds_size, adev->gds.gws_size, and adev->gds.oa_size at
zero to reflect this.
amdgpu_ttm_init() unconditionally calls amdgpu_ttm_init_on_chip() for
each of these resources regardless of size. When the size is zero,
amdgpu_ttm_init_on_chip() forwards the call to ttm_range_man_init(),
which calls drm_mm_init(mm, 0, 0). drm_mm_init() immediately fires
DRM_MM_BUG_ON(start + size <= start) -- trivially true when size is
zero -- crashing the kernel during modprobe of amdgpu on an RX 9070 XT.
Mhm in general not a bad idea, but we are having tons of GFX 12 systems in our test machines and nothing is crashing there.
We are clearly missing something here. Is that on an upstream kernel or something backported?
the reported oops/etc say 6.18.22 so that does not sound like something crazy backported
(https://bugzilla.kernel.org/show_bug.cgi?id=221376)