Re: [PATCH v9 04/10] x86: refcount: prevent gcc distortions

From: Nadav Amit
Date: Thu Oct 04 2018 - 06:23:37 EST


at 2:45 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:

>
> * Nadav Amit <namit@xxxxxxxxxx> wrote:
>
>>> Another, separate question I wanted to ask: how do we ensure that the kernel stays fixed?
>>> I.e. is there some tooling we can use to actually measure whether there's bad inlining decisions
>>> done, to detect all these bad patterns that cause bad GCC code generation?
>>
>> Good question. First, Iâll indicate that this patch-set does not handle all
>> the issues. There is still the issue of conditional use of
>> __builtin_constant_p().
>>
>> One indication for bad inlining decisions is the inlined functions have
>> multiple (non-inlined) instances in the binary and are short. I donât
>> have an automatic solution, but you can try, for example to run:
>>
>> nm --print-size ./vmlinux | grep ' t ' | cut -d' ' -f2- | sort | uniq -c | \
>> grep -v '^ 1' | sort -n -r | head -n 5
>>
>> There are however many false positives. After these patches, for example, I
>> get:
>>
>> 11 000000000000012f t jhash
>> 7 0000000000000017 t dst_output
>> 6 0000000000000011 t kzalloc
>> 5 000000000000002f t acpi_os_allocate_zeroed
>> 5 0000000000000029 t acpi_os_allocate
>>
>>
>> jhash() should not have been inlined in my mind, and should have a
>> non-inlined implementation. dst_output() is used as a function pointer.
>> kzalloc() and the next two suffer from the __builtin_constant_p() problem I
>> described in the past.
>
> Ok, that's useful info.
>
> The histogram suggests that with all your patches applied the kernel is now in a pretty good
> state in terms of inlining decisions, right?

It was just an example that I ran on the kernel I built right now (with a
custom config). Please donât regard these results as anything indicative.

> Are you using defconfig or a reasonable distro-config for your tests?

I think it is best to take the kernel and run localyesconfig for testing.

Taking Ubuntu 18.04 and doing the same gives the following results:

12 000000000000012f t jhash
7 0000000000000017 t dst_output
7 0000000000000014 t init_once
5 00000000000000d8 t jhash2
5 000000000000004e t put_page
5 000000000000002f t acpi_os_allocate_zeroed
5 0000000000000029 t acpi_os_allocate
5 0000000000000028 t umask_show
5 0000000000000011 t kzalloc
4 0000000000000053 t trace_xhci_dbg_quirks

Not awful, but not great.

It is a bit hard to fix the __builtin_constant_p() problem without having
some negative side-effects.

Reminder: __builtin_constant_p() is evaluated after the inlining decision
are done. You can use __builtin_choose_expr() instead of an âifâs and
instead of ternary operators when evaluating __builtin_constant_p() to solve
the problem. However, this causes the compiler not to know sometimes that a
value is constant because __builtin_choose_expr () evaluation happens too
early. This __builtin_choose_expr() problem is the reason for put_page() and
kzalloc() are not being inlined. Clang, again, does not suffer from this
problem.

Anyhow, it may be a good practice to try to get rid of the rest. For
example, dst_discard() has four instances because it is always given as a
function pointer. So it should not have been inlined.

Regards,
Nadav