Inlining can be _very_bad...
From: J.A. MagallÃn
Date: Wed Mar 28 2007 - 19:18:56 EST
Hi all...
I post this here as it can be of direct interest for kernel development
(as I recall many discussions about inlining yes or no...).
Testing other problems, I finally got this this issue: the same short
and stupid loop lasted from 3 to 5 times more if it was in main() than
if it was in an out-of-line function. The same (bad thing) happens if
the function is inlined.
The basic code is like this:
float data[];
[inline] double one()
{
double sum;
sum = 0;
for (i=0; i<SIZE; i++) sum += data[i];
return sum;
}
int main()
{
gettimeofday(&tv0,0);
for (i=0; i<SIZE; i++)
s0 += data[i];
gettimeofday(&tv1,0);
printf("T0: %6.2f ms\n",elap(tv0,tv1));
gettimeofday(&tv0,0);
s1 = one();
gettimeofday(&tv1,0);
printf("T1: %6.2f ms\n",elap(tv0,tv1));
}
The times if one() is not inlined (emt64, 2.33GHz):
apolo:~/e4> tst
T0: 1145.12 ms
S0: 268435456.00
T1: 457.19 ms
S1: 268435456.00
With one() inlined:
apolo:~/e4> tst
T0: 1200.52 ms
S0: 268435456.00
T1: 1200.14 ms
S1: 268435456.00
Looking at the assembler, the non-inlined version does:
.L2:
cvtss2sd (%rdx,%rax,4), %xmm0
incq %rax
cmpq $268435456, %rax
addsd %xmm0, %xmm1
jne .L2
and the inlined
.L13:
cvtss2sd (%rdx,%rax,4), %xmm0
incq %rax
cmpq $268435456, %rax
addsd 8(%rsp), %xmm0
movsd %xmm0, 8(%rsp)
jne .L13
It looks like is updating the stack on each iteration...This is -march=opteron
code, the -march=pentium4 is similar. Same behaviour with gcc3 and gcc4.
tst.c and Makefile attached.
Nice, isn't it ? Please, probe where is my fault...
--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2007.1 (Cooker) for i586
Linux 2.6.20-jam06 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP PREEMPT
Attachment:
Makefile
Description: Binary data
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#define SIZE 256*1024*1024
#define elap(t0,t1) \
((1000*t1.tv_sec+0.001*t1.tv_usec) - (1000*t0.tv_sec+0.001*t0.tv_usec))
double one();
float *data;
#ifdef INLINE
inline
#endif
double one()
{
int i;
double sum;
sum = 0;
asm("#FBGN");
for (i=0; i<SIZE; i++)
sum += data[i];
asm("#FEND");
return sum;
}
int main(int argc,char** argv)
{
struct timeval tv0,tv1;
double s0,s1;
int i;
data = malloc(SIZE*sizeof(float));
for (i=0; i<SIZE; i++)
data[i] = 1;
gettimeofday(&tv0,0);
s0 = 0;
asm("#MBGN");
for (i=0; i<SIZE; i++)
s0 += data[i];
asm("#MEND");
gettimeofday(&tv1,0);
printf("T0: %6.2f ms\n",elap(tv0,tv1));
printf("S0: %0.2lf\n",s0);
gettimeofday(&tv0,0);
s1 = one();
gettimeofday(&tv1,0);
printf("T1: %6.2f ms\n",elap(tv0,tv1));
printf("S1: %0.2lf\n",s1);
free(data);
return 0;
}