[REGRESSION 7.2] drm/amdgpu: ~275 SDMA jobs per sparse VA bind since 4cdbba5a (RE Requiem 90->4 fps)
From: Mikhail Gavrilov
Date: Sun Jun 21 2026 - 16:55:46 EST
Hi Christian, Alex,
git bisect points to
4cdbba5a16aa ("drm/amdgpu: restructure VM state machine v4")
as the first bad commit (its parent tests fine) for a severe
interactivity regression.
It was merged during the current 7.2 merge window; it is not in any
released kernel yet and will first appear in 7.2-rc1.
Symptom: Resident Evil Requiem (re9.exe under VKD3D-Proton, RADV, RX
7900 XTX / Navi31, gfx11) drops from ~90 to 3-4 fps the instant the
camera moves; still scenes are fine. The previous bisect point
d352990bcaab is smooth.
The game streams tiled/sparse resources, so every camera move issues a
burst of vkQueueBindSparse, i.e. a burst of DRM_IOCTL_AMDGPU_GEM_VA
going through amdgpu_gem_va_update_vm. The regression is confined to
that path; the CS submission path is unchanged
(bo_update-per-submission is identical on both sides).
What the restructure changed, measured: a single sparse bind now fans
out into hundreds of individual SDMA PTE-write jobs instead of a
couple. Same workload (active camera panning), identical kernel config
on both bisect points:
good (d352990bcaab), per second under panning:
amdgpu_gem_va_update_vm ~1460
drm_suballoc_new ~3070
amdgpu_job_alloc_with_ib ~3070
=> ~2.1 SDMA jobs per bind
bad (4cdbba5a16aa), per second under the same panning:
amdgpu_gem_va_update_vm ~158
drm_suballoc_new ~43500
amdgpu_job_alloc_with_ib ~43500
=> ~275 SDMA jobs per bind
So per-bind job/IB count went from ~2 to ~275 (~130x), and the
absolute job_alloc rate rose ~14x (3k -> 43.5k/s) even though the bad
kernel is starved to 4 fps and therefore issues far fewer binds per
second. The PTE-range update inside one bind appears to have lost its
coalescing and now submits roughly one job per fragment.
Caveat on magnitude. Both kernels are debug builds (KASAN + LOCKDEP +
PREEMPT_FULL), which is my normal environment. The 90->4 collapse is
dominated by KASAN: in perf the amdgpu_gem_va_update_vm subtree is
~19% of cycles on bad, of which ~17 points are
kasan_save_stack/stack-walking on each allocation, so the
uninstrumented CPU cost of building these jobs is only ~3%. But the
per-bind job *count* is config-independent: ~275 jobs per bind
regardless of build. On this debug kernel that already came to ~43.5k
jobs/s (~14x the old rate); a production kernel could see a higher
bind rate, not lower, so the job rate and the corresponding ring/fence
overhead would be at least as bad.
I have a reliable reproducer and both bisect endpoints built; glad to
test a fix and provide Tested-by, and to grab any further traces (the
suballocator churn is drm_suballoc_new/insert/try_free, consistent
with the job flood). Full bisect log and perf/bpftrace captures on
request.
Kernel .config attached.
Probe: https://linux-hardware.org/?probe=e92f6143b2
Mesa/RADV: 26.1.99 (git 99a268c)
vkd3d-proton: 3.1.0 (ee737e324376289)
Proton: Experimental 11.0 (build 20260617)
#regzbot introduced: 4cdbba5a16aa
--
Thanks,
Mikhail
Attachment:
.config.zip
Description: Zip archive