Re: [PATCH] force inlining of spinlock ops

From: Denys Vlasenko
Date: Tue May 12 2015 - 07:03:13 EST

On 05/12/2015 09:44 AM, Ingo Molnar wrote:
> * Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
>> With both gcc 4.7.2 and 4.9.2, sometimes gcc mysteriously doesn't inline
>> very small functions we expect to be inlined. In particular,
>> with this config:
>> there are more than a thousand copies of tiny spinlock-related functions:
> That's an x86-64 allyesconfig AFAICS, right?

Close, but I disabled options which are clearly "heavy debugging" stuff.
IOW: many developers run their work machines with lock debugging etc,
but few would constantly use something which slows kernel down by a factor of 3!

So, CONFIG_KASAN is off. CONFIG_STAGING is also off. And a few others I forgot.

I'm using this config to see which inlines should be deinlined.
For that, I need to cover all callsites of each inline.
Thus, I need ~allyesconfig.

The discovery that there also exists the opposite problem (wrongly
*un*inlined functions) was accidental.

> It's not mysterious, but an effect of -Os plus allowing GCC to do
> inlining heuristics:
> Does the problem go away if you unset of these config options?

problem greatly diminishes, but is not eliminated.
Testing allyesconfig would take too long, so I just took defconfig.

On defconfig kernel, the following functions below 16 bytes
of machine code are auto-deinlined:

#Calls_ Size(hex)_______ Name____________________
7 000000000000000b t hweight_long
5 000000000000000f t init_once
4 000000000000000d t cpumask_set_cpu
4 000000000000000b t udp_lib_close
4 0000000000000006 t udp_lib_hash
3 000000000000000a t nofill
3 0000000000000006 t sg_set_page.part.7
2 000000000000000f t udplite_sk_init
2 000000000000000f t ct_seq_next
2 000000000000000e t encode_cookie
2 000000000000000d t ktime_get_real
2 000000000000000b t spin_lock
2 000000000000000b t device_create_release
2 000000000000000b t cpu_smt_flags
2 000000000000000b t cpu_core_flags
2 0000000000000009 t default_write_file
2 0000000000000008 t __initcall_pl_driver_init6
2 0000000000000008 t __initcall_nf_defrag_init6
2 0000000000000008 t __initcall_hid_init6
2 0000000000000008 t __initcall_ch_driver_init6
2 0000000000000008 t default_read_file
2 0000000000000006 t wiphy_to_rdev.part.4
2 0000000000000006 t s_stop
2 0000000000000006 t sg_set_page.part.3
2 0000000000000006 t generic_print_tuple
2 0000000000000006 t exp_seq_stop
2 0000000000000006 t ct_seq_stop
2 0000000000000006 t ct_cpu_seq_stop

In particular, one of the functions from my patches,
spin_lock(), has been auto-deinlined:

ffffffff8108adb0 <spin_lock>:
ffffffff8108adb0: 55 push %rbp
ffffffff8108adb1: 48 89 e5 mov %rsp,%rbp
ffffffff8108adb4: e8 37 db 81 00 callq ffffffff818a88f0 <_raw_spin_lock>
ffffffff8108adb9: 5d pop %rbp
ffffffff8108adba: c3 retq

> Furtermore, what is the size win on x86 defconfig with these options
> set?


Size difference for CC_OPTIMIZE_FOR_SIZE:

text data bss dec hex filename
12335864 1746152 1081344 15163360 e75fe0 vmlinux.CC_OPTIMIZE_FOR_SIZE=y
10373764 1684200 1077248 13135212 c86d6c vmlinux.CC_OPTIMIZE_FOR_SIZE=n

Decrease by about 19%.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at