[PATCH AUTOSEL 7.0-6.18] drm/amdgpu: fix syncobj leak for amdgpu_gem_va_ioctl()

From: Sasha Levin

Date: Mon Apr 20 2026 - 10:49:00 EST


From: Prike Liang <Prike.Liang@xxxxxxx>

[ Upstream commit a0f0b6d31a53a7607ed44f7623faafc628333258 ]

It requires freeing the syncobj and chain
alloction resource.

Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx>
Reviewed-by: Christian König <christian.koenig@xxxxxxx>
Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---

LLM Generated explanations, may be completely bogus:

The 7.0 tree doesn't have `(!adev->debug_vm || timeline_syncobj)` — it
has `!adev->debug_vm`. This confirms the diff was created against a
newer mainline. The fix's core logic still applies.

---

## Complete Analysis

### PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Subsystem: `drm/amdgpu`
- Action verb: "fix"
- Summary: Fix syncobj and chain allocation resource leak in
amdgpu_gem_va_ioctl()
- Record: [drm/amdgpu] [fix] [syncobj/chain resource leak in gem VA
ioctl]

**Step 1.2: Tags**
- `Reviewed-by: Christian König <christian.koenig@xxxxxxx>` — subsystem
co-maintainer
- `Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>` — AMD GPU
maintainer committed it
- `Signed-off-by: Prike Liang <Prike.Liang@xxxxxxx>` — AMD engineer,
author
- No Fixes: tag, no Reported-by:, no Cc: stable — expected for manual
review candidates
- Record: Reviewed by Christian König (DRM/amdgpu co-maintainer).
Committed by Alex Deucher.

**Step 1.3: Commit Body**
- Describes: "requires freeing the syncobj and chain allocation
resource"
- Bug: syncobj refcount and chain memory are never released after use
- Failure mode: resource/memory leak on every ioctl call with timeline
syncobj
- Record: Clear resource leak. Every call to the ioctl with timeline
syncobj leaks memory.

**Step 1.4: Hidden Bug Fixes**
- This is NOT hidden — it explicitly says "fix...leak"
- Record: Explicit bug fix.

### PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- Files: `drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c` only
- Changes: +5 lines added (3 in ioctl cleanup, 1 NULL assignment in
helper, 1 NULL assignment in ioctl)
- Functions modified: `amdgpu_gem_update_timeline_node()` and
`amdgpu_gem_va_ioctl()`
- Record: Single-file surgical fix, 5 meaningful lines added.

**Step 2.2: Code Flow Changes**

Hunk 1 — `amdgpu_gem_update_timeline_node()`:
- BEFORE: When `dma_fence_chain_alloc()` fails, calls
`drm_syncobj_put(*syncobj)` and returns -ENOMEM, leaving `*syncobj` as
a dangling pointer.
- AFTER: Also sets `*syncobj = NULL` to prevent dangling pointer.

Hunk 2 — `amdgpu_gem_va_ioctl()`:
- BEFORE: After `drm_syncobj_add_point()` consumes `timeline_chain`,
`timeline_chain` still points to consumed memory. The `error:` label
never frees `timeline_chain` or puts `timeline_syncobj`.
- AFTER: Sets `timeline_chain = NULL` after consumption. Adds
`dma_fence_chain_free(timeline_chain)` and
`drm_syncobj_put(timeline_syncobj)` to cleanup.

**Step 2.3: Bug Mechanism**
- Category: **Resource leak** (syncobj refcount leak + memory leak)
- `drm_syncobj_find()` increments refcount — never decremented by caller
- `dma_fence_chain_alloc()` allocates memory — never freed when not
consumed
- Record: Missing cleanup for refcounted object and allocated memory on
both success and error paths.

**Step 2.4: Fix Quality**
- Obviously correct: adds standard cleanup patterns (NULL-after-consume,
free/put at error label)
- Minimal and surgical: 5 meaningful lines
- No regression risk: `dma_fence_chain_free(NULL)` = `kfree(NULL)` is
safe; `drm_syncobj_put` is guarded by NULL check
- Record: High quality, zero regression risk.

### PHASE 3: GIT HISTORY

**Step 3.1: Blame**
- `amdgpu_gem_update_timeline_node` — introduced by `70773bef4e091f`
(Arvind Yadav, Sep 2024)
- Timeline call moved before switch by `ad6c120f688803` (Feb 2025, "fix
the memleak caused by fence not released")
- Inline timeline handling in ioctl by `bd8150a1b3370` (Dec 2025, v4
refactor)
- Record: Buggy code introduced in 70773bef4e091f, worsened by
ad6c120f688803 which moved allocation before switch but didn't add
cleanup.

**Step 3.2: Fixes tag**
- No Fixes: tag present. Based on analysis, the bug was introduced in
`70773bef4e091f` and never had proper cleanup.
- Record: Bug exists since original timeline code introduction.

**Step 3.3: File History**
- 31 commits since `ad6c120f688803`. Active file with many recent
changes.
- The v4 refactor (`bd8150a1b3370`) and v7 refactor (`efdc66fe12b07`)
touched the same code but neither added cleanup.
- Record: Standalone fix, no prerequisites beyond code already in 7.0
tree.

**Step 3.4: Author**
- Prike Liang: AMD engineer, regular contributor to amdgpu driver with
multiple recent fixes.
- Record: Active AMD GPU developer, credible author.

**Step 3.5: Dependencies**
- None. The fix only adds cleanup to existing code paths. All referenced
functions exist in 7.0.
- Minor context conflict: mainline has `(!adev->debug_vm ||
timeline_syncobj)` vs 7.0's `!adev->debug_vm`, but the fix's added
lines don't depend on this condition.
- Record: Standalone fix, minor context adjustment needed.

### PHASE 4: MAILING LIST RESEARCH

**Step 4.1-4.5:**
- b4 dig could not find the original patch submission (lore.kernel.org
blocked by Anubis).
- The related commit `ad6c120f688803` explicitly described the memleak
problem with a full stack trace showing BUG in drm_sched_fence slab
during module unload — evidence the leak has real impact.
- Christian König (co-maintainer) reviewed the fix.
- Record: Could not access lore. However, reviewer is the subsystem co-
maintainer, which is strong endorsement.

### PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1-5.4:**
- `amdgpu_gem_va_ioctl()` is a DRM ioctl handler directly callable from
userspace
- Called every time userspace maps/unmaps GPU virtual address space
- This is a HOT path for GPU applications (Mesa, AMDVLK, ROCm)
- Every call with a timeline syncobj leaks the syncobj refcount and
potentially the chain allocation
- Record: Ioctl path reachable from any GPU userspace application. Very
high call frequency.

### PHASE 6: STABLE TREE ANALYSIS

**Step 6.1:** The buggy code exists in 7.0 tree. Confirmed via blame:
`70773bef4e091f` (Sep 2024) and `ad6c120f688803` (Feb 2025) are both
present.

**Step 6.2:** Minor context conflict due to condition difference in line
979. Would need a trivial backport adjustment, or `git apply --3way`
could handle it.

**Step 6.3:** No related fix already in stable for this specific leak.

### PHASE 7: SUBSYSTEM CONTEXT

- Subsystem: `drivers/gpu/drm/amd/amdgpu` — GPU driver
- Criticality: IMPORTANT — AMD GPUs are extremely common in desktops,
servers, and workstations
- Active subsystem with frequent changes
- Record: [IMPORTANT] AMD GPU driver, widely used hardware.

### PHASE 8: IMPACT AND RISK

**Step 8.1:** Affected users: All users with AMD GPUs using
userqueue/timeline syncobj features (Mesa Vulkan, ROCm).

**Step 8.2:** Trigger: Any GPU application calling the VA ioctl with a
timeline syncobj. Repeated calls (normal GPU operation) cause cumulative
memory leak.

**Step 8.3:** Failure mode: Memory leak in hot ioctl path → eventual OOM
under sustained GPU workloads. Severity: **HIGH** (gradual resource
exhaustion).

**Step 8.4:**
- BENEFIT: High — prevents memory leak in frequently-called GPU ioctl
- RISK: Very low — 5-line fix adding standard NULL-and-free patterns,
reviewed by subsystem maintainer
- Record: High benefit, very low risk.

### PHASE 9: SYNTHESIS

**Evidence FOR backporting:**
- Fixes a real resource leak (syncobj refcount + chain memory) in a
userspace-facing ioctl
- Every call with timeline syncobj leaks resources — cumulative, leads
to OOM under sustained use
- Small, surgical fix: 5 meaningful lines in a single file
- Reviewed by Christian König (DRM/amdgpu co-maintainer)
- Obviously correct: standard cleanup patterns (NULL-after-consume,
free/put at cleanup label)
- Bug code exists in 7.0 tree (confirmed via blame)
- Zero regression risk: `kfree(NULL)` is safe, NULL checks guard all
puts

**Evidence AGAINST backporting:**
- Minor context conflict (condition text differs between mainline and
7.0) — trivially resolvable
- Timeline syncobj feature is relatively new (Sep 2024) — may not affect
all users yet

**Stable rules checklist:**
1. Obviously correct and tested? **YES** — standard cleanup pattern,
reviewed by co-maintainer
2. Fixes a real bug? **YES** — resource leak in ioctl path
3. Important issue? **YES** — memory leak in hot path → eventual OOM
4. Small and contained? **YES** — 5 lines, single file
5. No new features? **YES** — only adds missing cleanup
6. Can apply to stable? **YES** — with minor context adjustment

### Verification

- [Phase 1] Parsed tags: Reviewed-by Christian König (co-maintainer),
SOB Alex Deucher (maintainer)
- [Phase 2] Diff analysis: 5 lines added — NULL assignment in helper
error path, NULL assignment after chain consumption, 3-line cleanup in
error label (chain free + syncobj put)
- [Phase 3] git blame: timeline code introduced in 70773bef4e091f (Sep
2024), moved by ad6c120f688803 (Feb 2025), both in 7.0 tree
- [Phase 3] git show ad6c120f688803: confirmed this commit moved
timeline allocation before switch without adding cleanup — the root
cause
- [Phase 3] git show bd8150a1b3370: v4 refactor inlined timeline
handling, still no cleanup
- [Phase 3] git show efdc66fe12b07: v7 refactor, still no cleanup
- [Phase 5] amdgpu_gem_va_ioctl is DRM ioctl handler — directly callable
from userspace, hot path for GPU apps
- [Phase 5] Confirmed drm_syncobj_add_point() consumes chain
(dma_fence_chain_init + rcu_assign_pointer), so NULL-after-use is
correct
- [Phase 5] Confirmed dma_fence_chain_free(NULL) is safe (just
kfree(NULL))
- [Phase 6] Verified no drm_syncobj_put(timeline_syncobj) in current 7.0
file — bug confirmed present
- [Phase 6] Minor context conflict: 7.0 has `!adev->debug_vm`, mainline
has `(!adev->debug_vm || timeline_syncobj)` — needs trivial adjustment
- [Phase 8] Failure mode: cumulative memory/refcount leak → eventual
OOM, severity HIGH
- UNVERIFIED: Could not access lore.kernel.org for original patch
discussion (blocked by Anubis)

**YES**

drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index c4839cf2dce37..3f95aca700264 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -107,6 +107,7 @@ amdgpu_gem_update_timeline_node(struct drm_file *filp,
*chain = dma_fence_chain_alloc();
if (!*chain) {
drm_syncobj_put(*syncobj);
+ *syncobj = NULL;
return -ENOMEM;
}

@@ -983,6 +984,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
timeline_chain,
fence,
args->vm_timeline_point);
+ timeline_chain = NULL;
}
}
dma_fence_put(fence);
@@ -990,6 +992,9 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
}

error:
+ dma_fence_chain_free(timeline_chain);
+ if (timeline_syncobj)
+ drm_syncobj_put(timeline_syncobj);
drm_exec_fini(&exec);
error_put_gobj:
drm_gem_object_put(gobj);
--
2.53.0