Re: [linux-next:master 14191/14955] vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame

From: Christian König

Date: Wed Jun 24 2026 - 05:48:32 EST

On 6/24/26 11:28, Peter Zijlstra wrote:
> On Wed, Jun 24, 2026 at 01:57:46AM +0500, Mikhail Gavrilov wrote:
>> [+Josh, +Peter: objtool question below]
>>
>> On Tue, Jun 23, 2026 at 8:17 PM kernel test robot <lkp@xxxxxxxxx> wrote:
>>>>> vmlinux.o: error: objtool: amdgpu_vm_handle_fault+0x186: sibling call from callable instruction with modified stack frame
>>
>> I looked into this. It is an objtool false positive on a computed goto,
>> not a problem in the patch, and not specific to clang 22.1.3.
>
> Urrgh, computed gotos :-(
>
>> Config has CONFIG_LTO_CLANG_THIN=y, CONFIG_FRAME_POINTER=y, KASAN and
>> OBJTOOL_WERROR=y. objtool runs at the vmlinux.o link stage (per-TU
>> objects are LLVM bitcode under LTO, so the single amdgpu_vm.o never
>> reaches objtool). The robot hit this with clang 22.1.3; I reproduced it
>> on the same config with my system clang 22.1.8 (CONFIG_CLANG_VERSION=
>> 220108), so it is not a 22.1.3-only codegen issue.
>>
>> What +0x186 actually is (disasm of vmlinux.o, WERROR dropped so the
>> object survives):
>>
>> 17f: 48 c7 c0 00 00 00 00 mov $0x0,%rax
>> R_X86_64_32S .text.amdgpu_vm_handle_fault+0x196
>> 186: ff e0 jmp *%rax
>>
>> %rax is loaded via an R_X86_64_32S relocation with the address of label
>> .text.amdgpu_vm_handle_fault+0x196, and +0x196 is an unconditional jmp
>> back to +0x13e, the head of the second drm_exec_until_all_locked() loop.
>> This is the drm_exec_retry_on_contention() computed goto
>> (goto *__drm_exec_retry_ptr). There is a second identical pair at +0x194
>> -> +0x188 for the first loop. Both targets are inside the function; this
>> is not a tail call into another function. (svm_range_restore_pages() is
>> the inline stub here, CONFIG_HSA_AMD is not set, so that path is gone.)
>>
>> So clang materialized the label address as mov $imm(reloc); jmp *%rax
>> instead of folding it into a direct jmp to the label.
>
> So, if my heat-addled brain isn't completely gone, then this all boils
> down to something like:
>
> drm_exec_init(&exec);
> label:
> for (;drm_exec_cleanup(&exec);) {
> ...
> if (unlikely(drm_exec_is_contended(&exec)))
> goto label;
> ...
> }
> ...
> drm_exec_fini(&exec);
>
> Except, you've laundered that label through a computed goto to allow
> multiple such constructs in a single function -- because can't have
> multiple identical labels etc.

Just FYI: That's actually not the reason for this, the label name is unique now.

The computed goto is used to limit the scope you can use the retry macros because we initially had problems with people using that outside of the loop.

>
> And then clang, can't untangle the web and makes a mess of it. Which is
> really rather unfortunate, because indirect calls are yuck -- also
> retpolines and cfi and all that jazz.
>
> I do think this is very much a compiler issue, clang should never emit
> an indirect call for this. Doing that is just aweful.
>
> Also, note that if anybody were ever to use a guard() inside this
> drm_exec_until_all_locked() construct, there is no guarantee the
> computed goto will actually trip the __cleanup(). Computed goto's and
> scope do not play well together. And the moment clang emits an indirect
> call, it means it lost track of things and things *will* go sideways.

Good to know.

>
> And the only reason you want that label, is because nested loops,
> because without that, a simple 'continue' would do just fine.

Exactly that, yes. The nested loops are unfortunately a very common use case here.

>
> That is, I think this drm_exec_until_all_locked() thing has some
> fundamental problems the moment clang fails to optimize it away.
> Correctness really depends on the compiler not actually ever doing a
> computed goto.

Mhm, that's not good at all but I also of hand don't see a way to avoid the computed goto.

Thanks for pointing out what's wrong here, I was already scratching my head about that quite a bit.

Regards,
Christian.

>
>> For the indirect
>> jmp objtool looks for a jump table in .rodata, finds none (this is a
>> single relocated label, not an indexed table), and falls back to
>> treating it as an indirect sibling call. The frame is already set up
>> (push %rbp at +0x5, sub $0x160,%rsp at +0x16), hence "sibling call with
>> modified stack frame". Runtime is fine, the jmp lands on the intended
>> in-function label; this is purely an objtool classification issue.
>> KASAN probably
>> tips the balance: the function is inflated with __asan_* checks
>> and shadow tests, and on that body clang keeps the label in a register
>> rather than folding it.
>
> Right.
>
>> The drm_exec_until_all_locked() / drm_exec_retry_on_contention() macros
>> are used widely (amdgpu_cs, amdgpu_gem, etc.); the patch under bisect is
>> just the first to put such a loop into amdgpu_vm_handle_fault().
>
> It is also the first to cause clang to loose its mind and emit an
> indirect call. Which as I've explained above, really is a problem.
>
>> objtool question: should objtool resolve an indirect jmp whose target
>> register is loaded from a relocation pointing at a .text label inside
>> the same function, and treat it as an intra-function jump rather than a
>> sibling call? That would cover the labels-as-values / computed-goto
>> pattern that this and any future drm_exec user under LTO+KASAN will hit.
>>
>> I am not respinning the series for this, the page-fault conversion is a
>> pure refactor with no manual stack work. Happy to test an objtool fix,
>> or to help with a drm_exec.h workaround if that is preferred over an
>> objtool change.
>
> While I do think we could teach objtool about this, I also think it
> should still emit a warning, because as stated, this pattern is very
> very suspect and likely broken :-(
>
>