Re: [PATCH] x86: Deinline cpuid_eax and friends

From: Denys Vlasenko
Date: Wed May 06 2015 - 15:10:36 EST


On 05/06/2015 08:59 PM, H. Peter Anvin wrote:
> On 05/06/2015 10:07 AM, Denys Vlasenko wrote:
>> cpuid_e{a,b,c,d}x() functions compile to 44 bytes of machine code each.
>> On x86 allyesconfig build they have 48 callsites.
>> Deinlining all four of them shrinks kernel by about 1k:
>>
>> text data bss dec hex filename
>> 82434909 22255384 20627456 125317749 7783275 vmlinux.before
>> 82433898 22255384 20627456 125316738 7782e82 vmlinux
>>
>> Speed impact: CPUID instruction takes from 50 to 350+ cycles,
>> call overhead is negligible in comparison.
>
> How on Earth does it make 44 bytes? Is this due to paravirt_fail?

No, just this construct

unsigned int eax, ebx, ecx, edx;
cpuid(op, &eax, &ebx, &ecx, &edx);

is not really that cheap to set up. You need to allocate
variables on stack and take address of each:

ffffffff81063668 <cpuid_eax>:
ffffffff81063668: 55 push %rbp
ffffffff81063669: 48 89 e5 mov %rsp,%rbp
ffffffff8106366c: 48 83 ec 10 sub $0x10,%rsp
ffffffff81063670: 48 8d 4d fc lea -0x4(%rbp),%rcx
ffffffff81063674: 89 7d f0 mov %edi,-0x10(%rbp)
ffffffff81063677: 48 8d 55 f8 lea -0x8(%rbp),%rdx
ffffffff8106367b: 48 8d 75 f4 lea -0xc(%rbp),%rsi
ffffffff8106367f: 48 8d 7d f0 lea -0x10(%rbp),%rdi
ffffffff81063683: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp)
ffffffff8106368a: e8 3c ff ff ff callq ffffffff810635cb <__cpuid>
ffffffff8106368f: 8b 45 f0 mov -0x10(%rbp),%eax
ffffffff81063692: c9 leaveq
ffffffff81063693: c3 retq

--
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/