Re: [patch v2] epoll use a single inode ...

From: Eric Dumazet
Date: Tue Mar 06 2007 - 13:11:05 EST


On Tuesday 06 March 2007 18:28, Eric Dumazet wrote:
> On Tuesday 06 March 2007 18:19, Linus Torvalds wrote:
>
> > > Using reciprocal divides permits to change each divide by two
> > > multiplies, less expensive on current CPUS.
> >
> > Are you sure?
>
> I am going to test this, but at least on Opterons, the reciprocal divide I
> added into mm/slab.c gave me a nice speedup.
>

With attached test program (one million calls to pipe()), I got about 0.1 s of
speedup on my 1.6 GHz Pentium(M) dell D610 machine
(3.350 s instead of 3.450 s, on many runs)

Thats about 100 ns saved per number() call

But then I realized that on ia32, gcc compilers is not very smart :

static inline u32 reciprocal_divide(u32 A, u32 R)
{
return (u32)(((u64)A * R) >> 32);
}

It generates two multiplies... arg...

//begin of reciprocal_divide()
4b0: 8b 4c 24 28 mov 0x28(%esp),%ecx
4b4: 89 f0 mov %esi,%eax
4b6: f7 64 24 24 mull 0x24(%esp)
4ba: 0f af ce imul %esi,%ecx
4bd: 8d 14 11 lea (%ecx,%edx,1),%edx
// end of reciprocal_divide()
4c0: 8b 8c 24 8c 00 00 00 mov 0x8c(%esp),%ecx
4c7: 89 d0 mov %edx,%eax
4c9: 0f af c8 imul %eax,%ecx
4cc: 29 ce sub %ecx,%esi
4ce: 8b 4c 24 1c mov 0x1c(%esp),%ecx
4d2: 0f b6 34 31 movzbl (%ecx,%esi,1),%esi
4d6: 89 f1 mov %esi,%ecx
4d8: 89 c6 mov %eax,%esi
4da: 88 0f mov %cl,(%edi)
4dc: 47 inc %edi
4dd: ff 44 24 20 incl 0x20(%esp)
4e1: 85 c0 test %eax,%eax
4e3: 75 cb jne 4b0 <number+0x160>

So even with a total of 3 multiplies per digit, we win...

Maybe some bit of x86 asm is needed to make gcc be smarter (using only one
multiply for reciprocal_divide())

/*
* micro benchmark to time calls to pipe()/close()
*/
main()
{
int fd[100*2];
unsigned int l, i;

for (l = 0 ; l < 10000 ; l++) {
for (i = 0 ; i < 100*2 ; i+=2)
pipe(fd + i);
for (i = 0 ; i < 100*2 ; i++)
close(fd[i]);
}
}