Re: [RFC PATCH] x86/64: Optimize the effective instruction cache footprint of kernel functions

From: Denys Vlasenko
Date: Sat Apr 16 2016 - 17:09:14 EST


On Thu, May 21, 2015 at 1:38 PM, Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
> On 05/20/2015 02:21 PM, Denys Vlasenko wrote:
>> So what we need is to put something like ".p2align 64,,7"
>> before every function.
>>
>> (
>> Why 7?
>>
>> defconfig vmlinux (w/o FRAME_POINTER) has 42141 functions.
>> 6923 of them have 1st insn 5 or more bytes long,
>> 5841 of them have 1st insn 6 or more bytes long,
>> 5095 of them have 1st insn 7 or more bytes long,
>> 786 of them have 1st insn 8 or more bytes long,
>> 548 of them have 1st insn 9 or more bytes long,
>> 375 of them have 1st insn 10 or more bytes long,
>> 73 of them have 1st insn 11 or more bytes long,
>> one of them has 1st insn 12 bytes long:
>> this "heroic" instruction is in local_touch_nmi()
>> 65 48 c7 05 44 3c 00 7f 00 00 00 00
>> movq $0x0,%gs:0x7f003c44(%rip)
>>
>> Thus ensuring that at least seven first bytes do not cross
>> 64-byte boundary would cover >98% of all functions.
>> )
>>
>> gcc can't do that right now. With -falign-functions=N,
>> it emits ".p2align next_power_of_2(N),,N-1"
>>
>> We need to make it just a tiny bit smarter.
>>
>>> We'd need toolchain help to do saner alignment.
>>
>> Yep.
>> I'm going to create a gcc BZ with a feature request,
>> unless you disagree with my musings above.
>
> The BZ is here:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66240

...and now this BZ has a working patch, which implements e.g.
-falign-functions=64,7