Suboptimal inline heuristics due to non-code sections

From: Nadav Amit
Date: Tue May 01 2018 - 02:50:25 EST


When gcc considers the size of a function for inlining decisions, it
apparently considers *all* sections. Since the kernel extensively uses
sections for things other than code (e.g., exception-table, bug-table), the
optimality of these decisions seem questionable to me.

The objtoolâs sections may be the most extreme case, as these sections are
discarded, while their size still appears to be considered by the inlining
heuristics. It may be beneficial not to consider (some) the other sections
as well, as they do not affect code-caching but only increase the kernel
size.

To illustrate the issue, consider the function copy_overflow():

0xffffffff819315e0 <+0>: push %rbp
0xffffffff819315e1 <+1>: mov %rsi,%rdx
0xffffffff819315e4 <+4>: mov %edi,%esi
0xffffffff819315e6 <+6>: mov $0xffffffff820bc4b8,%rdi
0xffffffff819315ed <+13>: mov %rsp,%rbp
0xffffffff819315f0 <+16>: callq 0xffffffff81089b70 <__warn_printk>
0xffffffff819315f5 <+21>: ud2
0xffffffff819315f7 <+23>: pop %rbp
0xffffffff819315f8 <+24>: retq

This function seems to me as a great candidate for inlining. Yet, in my 4.16
build (using gcc 7.2), I get 38 non-inlined instances of this function in
vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number
non-inlined instances to 35. Removing, in addition, the data which is saved
in the __bug_table makes all the instances of the function to be inlined.

Obviously this certain function can be set as __always_inline, but the inline
heuristics seems to me as wrongfully biased.

What do you think?

Is there a way to make gcc to ignore sections for its inlining heuristics?

Thanks,
Nadav