Re: [PATCH] perf/x86/intel: Mark expected switch fall-throughs

From: Peter Zijlstra
Date: Wed Jun 26 2019 - 05:25:16 EST


On Tue, Jun 25, 2019 at 11:47:06PM +0200, Thomas Gleixner wrote:
> > On Tue, Jun 25, 2019 at 09:53:09PM +0200, Thomas Gleixner wrote:

> > > but it also makes objtool unhappy:
> > >
> > > arch/x86/events/intel/core.o: warning: objtool: intel_pmu_nhm_workaround()+0xb3: unreachable instruction
> > > kernel/fork.o: warning: objtool: free_thread_stack()+0x126: unreachable instruction
> > > mm/workingset.o: warning: objtool: count_shadow_nodes()+0x11f: unreachable instruction
> > > arch/x86/kernel/cpu/mtrr/generic.o: warning: objtool: get_fixed_ranges()+0x9b: unreachable instruction
> > > arch/x86/kernel/platform-quirks.o: warning: objtool: x86_early_init_platform_quirks()+0x84: unreachable instruction
> > > drivers/iommu/irq_remapping.o: warning: objtool: irq_remap_enable_fault_handling()+0x1d: unreachable instruction

> I just checked two of them in the disassembly. In both cases it's jump
> label related. Here is one:
>
> asm volatile("1: rdmsr\n"
> 410: b9 59 02 00 00 mov $0x259,%ecx
> 415: 0f 32 rdmsr
> 417: 49 89 c6 mov %rax,%r14
> 41a: 48 89 d3 mov %rdx,%rbx
> return EAX_EDX_VAL(val, low, high);
> 41d: 48 c1 e3 20 shl $0x20,%rbx
> 421: 48 09 c3 or %rax,%rbx
> 424: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> 429: eb 0f jmp 43a <get_fixed_ranges+0xaa>
> do_trace_read_msr(msr, val, 0);
> 42b: bf 59 02 00 00 mov $0x259,%edi <------- "unreachable"
> 430: 48 89 de mov %rbx,%rsi
> 433: 31 d2 xor %edx,%edx
> 435: e8 00 00 00 00 callq 43a <get_fixed_ranges+0xaa>
> 43a: 44 89 35 00 00 00 00 mov %r14d,0x0(%rip) # 441 <get_fixed_ranges+0xb1>
>
> Interestingly enough there are some more hunks of the same pattern in that
> function which look all the same. Those are not upsetting objtool. Josh
> might give an hint where to stare at.

That's pretty atrocious code-gen :/ Does LLVM support things like label
attributes? Back when we did jump labels GCC didn't, or rather, it
ignored it completely when combined with asm goto (and it might still).

That is, would something like this:

diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 06c3cc22a058..1761b1e76ddc 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -32,7 +32,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
: : "i" (key), "i" (branch) : : l_yes);

return false;
-l_yes:
+l_yes: __attribute__((cold));
return true;
}

@@ -49,7 +49,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
: : "i" (key), "i" (branch) : : l_yes);

return false;
-l_yes:
+l_yes: __attribute__((hot));
return true;
}

Help LLVM?

Still, objtool should be able to deal with that code.

> Just for the fun of it I looked at the GCC output of the same file. It
> takes a different apporach:
>
> asm volatile("1: rdmsr\n"
> c70: b9 59 02 00 00 mov $0x259,%ecx
> c75: 0f 32 rdmsr
> return EAX_EDX_VAL(val, low, high);
> c77: 48 c1 e2 20 shl $0x20,%rdx
> c7b: 48 89 d3 mov %rdx,%rbx
> c7e: 48 09 c3 or %rax,%rbx
> c81: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> c86: 48 89 1d 00 00 00 00 mov %rbx,0x0(%rip) # c8d <get_fixed_ranges.constprop.5+0x7d>
>
> and the tracing code is completely out of line:
>
> do_trace_read_msr(msr, val, 0);
> ce2: 31 d2 xor %edx,%edx
> ce4: 48 89 de mov %rbx,%rsi
> ce7: bf 59 02 00 00 mov $0x259,%edi
> cec: e8 00 00 00 00 callq cf1 <get_fixed_ranges.constprop.5+0xe1>
> cf1: eb 93 jmp c86 <get_fixed_ranges.constprop.5+0x76>
>
> which makes a lot of sense as the normal path (tracepoint disabled) just
> runs through linearly while in the clang version it has to jump around the
> tracepoint code.
>
> The jump itself is not a problem, but what matters is the $I cache
> footprint. The GCC version hotpath fits in 3 cache lines while the Clang
> version unconditionally eats 4.2 of them. That's a huge difference.

Yeah, this is the right and expected code-gen.