Re: [PATCH] perf/x86/intel: Mark expected switch fall-throughs

From: Thomas Gleixner
Date: Tue Jun 25 2019 - 17:47:31 EST


Nathan,

On Tue, 25 Jun 2019, Nathan Chancellor wrote:
> On Tue, Jun 25, 2019 at 09:53:09PM +0200, Thomas Gleixner wrote:
> >
> > But can the script please check for a minimal clang version required to
> > build that thing.
> >
> > The default clang-3.8 which is installed on Debian stretch explodes. The
> > 6.0 variant from backports works as advertised.
> >
>
> Hmmm interesting, I test a lot of different distros using Docker
> containers to make sure the script works universally and that includes
> Debian stretch, which is the stress tester because all of the packages
> are older. I install the following packages then run the following
> command and it works fine for me (just tested):
>
> $ apt update && apt install -y --no-install-recommends ca-certificates \
> ccache clang cmake curl file gcc g++ git make ninja-build python3 \
> texinfo zlib1g-dev
> $ ./build-llvm.py
>
> If you could give me a build log, I'd be happy to look into it and see
> what I can do.

I can produce one tomorrow.

> > Kernel builds with the new shiny compiler. Jump labels seem to be enabled.
> >
> > It complains about a few type conversions:
> >
> > arch/x86/kvm/mmu.c:4596:39: warning: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -205 to 51 [-Wconstant-conversion]
> > u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
> > ~~ ^~
> >
>
> Yes, there was a patch sent to try and fix this but it was rejected by
> the maintainers:
>
> https://github.com/ClangBuiltLinux/linux/issues/95
>
> https://lore.kernel.org/lkml/20180619192504.180479-1-mka@xxxxxxxxxxxx/

Just looked through it. I don't think it's an outright reject. Paolo was
not totally against it and then the whole discussion degraded into bikeshed
painting and bitching about compiler error messaged. Try again or should I?

> > but it also makes objtool unhappy:
> >
> > arch/x86/events/intel/core.o: warning: objtool: intel_pmu_nhm_workaround()+0xb3: unreachable
instruction
> > kernel/fork.o: warning: objtool: free_thread_stack()+0x126: unreachable instruction
> > mm/workingset.o: warning: objtool: count_shadow_nodes()+0x11f: unreachable instruction
> > arch/x86/kernel/cpu/mtrr/generic.o: warning: objtool: get_fixed_ranges()+0x9b: unreachable
instruction
> > arch/x86/kernel/platform-quirks.o: warning: objtool: x86_early_init_platform_quirks()+0x84:
unreachable instruction
> > drivers/iommu/irq_remapping.o: warning: objtool: irq_remap_enable_fault_handling()+0x1d:
unreachable instruction

> Unfortunately, we have quite a few of those outstanding, it's probably
> time to start really taking a look at them:
>
> https://github.com/ClangBuiltLinux/linux/labels/objtool

I just checked two of them in the disassembly. In both cases it's jump
label related. Here is one:

asm volatile("1: rdmsr\n"
410: b9 59 02 00 00 mov $0x259,%ecx
415: 0f 32 rdmsr
417: 49 89 c6 mov %rax,%r14
41a: 48 89 d3 mov %rdx,%rbx
return EAX_EDX_VAL(val, low, high);
41d: 48 c1 e3 20 shl $0x20,%rbx
421: 48 09 c3 or %rax,%rbx
424: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
429: eb 0f jmp 43a <get_fixed_ranges+0xaa>
do_trace_read_msr(msr, val, 0);
42b: bf 59 02 00 00 mov $0x259,%edi <------- "unreachable"
430: 48 89 de mov %rbx,%rsi
433: 31 d2 xor %edx,%edx
435: e8 00 00 00 00 callq 43a <get_fixed_ranges+0xaa>
43a: 44 89 35 00 00 00 00 mov %r14d,0x0(%rip) # 441 <get_fixed_ranges+0xb1>

Interestingly enough there are some more hunks of the same pattern in that
function which look all the same. Those are not upsetting objtool. Josh
might give an hint where to stare at.

Just for the fun of it I looked at the GCC output of the same file. It
takes a different apporach:

asm volatile("1: rdmsr\n"
c70: b9 59 02 00 00 mov $0x259,%ecx
c75: 0f 32 rdmsr
return EAX_EDX_VAL(val, low, high);
c77: 48 c1 e2 20 shl $0x20,%rdx
c7b: 48 89 d3 mov %rdx,%rbx
c7e: 48 09 c3 or %rax,%rbx
c81: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
c86: 48 89 1d 00 00 00 00 mov %rbx,0x0(%rip) # c8d <get_fixed_ranges.constprop.5+0x7d>

and the tracing code is completely out of line:

do_trace_read_msr(msr, val, 0);
ce2: 31 d2 xor %edx,%edx
ce4: 48 89 de mov %rbx,%rsi
ce7: bf 59 02 00 00 mov $0x259,%edi
cec: e8 00 00 00 00 callq cf1 <get_fixed_ranges.constprop.5+0xe1>
cf1: eb 93 jmp c86 <get_fixed_ranges.constprop.5+0x76>

which makes a lot of sense as the normal path (tracepoint disabled) just
runs through linearly while in the clang version it has to jump around the
tracepoint code.

The jump itself is not a problem, but what matters is the $I cache
footprint. The GCC version hotpath fits in 3 cache lines while the Clang
version unconditionally eats 4.2 of them. That's a huge difference.

> Thanks for trying it out and letting us know. Please keep us in the loop
> if you happen to find anything amiss.

Will do.

Thanks,

tglx