Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
From: Peter Zijlstra
Date: Wed Jun 24 2026 - 05:33:40 EST
On Wed, Jun 24, 2026 at 01:57:46AM +0500, Mikhail Gavrilov wrote:
> [+Josh, +Peter: objtool question below]
>
> On Tue, Jun 23, 2026 at 8:17 PM kernel test robot <lkp@xxxxxxxxx> wrote:
> > >> vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
>
> I looked into this. It is an objtool false positive on a computed goto,
> not a problem in the patch, and not specific to clang 22.1.3.
Urrgh, computed gotos :-(
> Config has CONFIG_LTO_CLANG_THIN=y, CONFIG_FRAME_POINTER=y, KASAN and
> OBJTOOL_WERROR=y. objtool runs at the vmlinux.o link stage (per-TU
> objects are LLVM bitcode under LTO, so the single amdgpu_vm.o never
> reaches objtool). The robot hit this with clang 22.1.3; I reproduced it
> on the same config with my system clang 22.1.8 (CONFIG_CLANG_VERSION=
> 220108), so it is not a 22.1.3-only codegen issue.
>
> What +0x186 actually is (disasm of vmlinux.o, WERROR dropped so the
> object survives):
>
> 17f: 48 c7 c0 00 00 00 00 mov $0x0,%rax
> R_X86_64_32S .text.amdgpu_vm_handle_fault+0x196
> 186: ff e0 jmp *%rax
>
> %rax is loaded via an R_X86_64_32S relocation with the address of label
> .text.amdgpu_vm_handle_fault+0x196, and +0x196 is an unconditional jmp
> back to +0x13e, the head of the second drm_exec_until_all_locked() loop.
> This is the drm_exec_retry_on_contention() computed goto
> (goto *__drm_exec_retry_ptr). There is a second identical pair at +0x194
> -> +0x188 for the first loop. Both targets are inside the function; this
> is not a tail call into another function. (svm_range_restore_pages() is
> the inline stub here, CONFIG_HSA_AMD is not set, so that path is gone.)
>
> So clang materialized the label address as mov $imm(reloc); jmp *%rax
> instead of folding it into a direct jmp to the label.
So, if my heat-addled brain isn't completely gone, then this all boils
down to something like:
drm_exec_init(&exec);
label:
for (;drm_exec_cleanup(&exec);) {
...
if (unlikely(drm_exec_is_contended(&exec)))
goto label;
...
}
...
drm_exec_fini(&exec);
Except, you've laundered that label through a computed goto to allow
multiple such constructs in a single function -- because can't have
multiple identical labels etc.
And then clang, can't untangle the web and makes a mess of it. Which is
really rather unfortunate, because indirect calls are yuck -- also
retpolines and cfi and all that jazz.
I do think this is very much a compiler issue, clang should never emit
an indirect call for this. Doing that is just aweful.
Also, note that if anybody were ever to use a guard() inside this
drm_exec_until_all_locked() construct, there is no guarantee the
computed goto will actually trip the __cleanup(). Computed goto's and
scope do not play well together. And the moment clang emits an indirect
call, it means it lost track of things and things *will* go sideways.
And the only reason you want that label, is because nested loops,
because without that, a simple 'continue' would do just fine.
That is, I think this drm_exec_until_all_locked() thing has some
fundamental problems the moment clang fails to optimize it away.
Correctness really depends on the compiler not actually ever doing a
computed goto.
> For the indirect
> jmp objtool looks for a jump table in .rodata, finds none (this is a
> single relocated label, not an indexed table), and falls back to
> treating it as an indirect sibling call. The frame is already set up
> (push %rbp at +0x5, sub $0x160,%rsp at +0x16), hence "sibling call with
> modified stack frame". Runtime is fine, the jmp lands on the intended
> in-function label; this is purely an objtool classification issue.
> KASAN probably
> tips the balance: the function is inflated with __asan_* checks
> and shadow tests, and on that body clang keeps the label in a register
> rather than folding it.
Right.
> The drm_exec_until_all_locked() / drm_exec_retry_on_contention() macros
> are used widely (amdgpu_cs, amdgpu_gem, etc.); the patch under bisect is
> just the first to put such a loop into amdgpu_vm_handle_fault().
It is also the first to cause clang to loose its mind and emit an
indirect call. Which as I've explained above, really is a problem.
> objtool question: should objtool resolve an indirect jmp whose target
> register is loaded from a relocation pointing at a .text label inside
> the same function, and treat it as an intra-function jump rather than a
> sibling call? That would cover the labels-as-values / computed-goto
> pattern that this and any future drm_exec user under LTO+KASAN will hit.
>
> I am not respinning the series for this, the page-fault conversion is a
> pure refactor with no manual stack work. Happy to test an objtool fix,
> or to help with a drm_exec.h workaround if that is preferred over an
> objtool change.
While I do think we could teach objtool about this, I also think it
should still emit a warning, because as stated, this pattern is very
very suspect and likely broken :-(