Re: [patch 13/13] lseek speedup

From: Anton Altaparmakov (aia21@cantab.net)
Date: Wed Jul 17 2002 - 05:54:35 EST


Hi Andrew,

At 06:31 17/07/02, Andrew Morton wrote:
> static inline loff_t llseek(struct file *file, loff_t offset, int origin)
> {
>- loff_t (*fn)(struct file *, loff_t, int);
>-
>- fn = default_llseek;
> if (file->f_op && file->f_op->llseek)
>- fn = file->f_op->llseek;
>- return fn(file, offset, origin);
>+ return (*file->f_op->llseek)(file, offset, origin);
>+ return default_llseek(file, offset, origin);
> }

This one is interesting. I have been wondering for quite a while whether
constructs like the original one actually produce better machine code or
whether they should all be cleaned up just as you do here. I was never
bothered enough about it to try but I have half an hour spare at the moment
so I tried both - using gcc-2.96, RedHat 7.3 latest version, ia32,
compiling for Athlon.

While I cannot speak for other compilers, gcc-2.96 at least generates
better code for the old llseek() than for the new llseek() proposed by the
above patch snippet.

The old code has following advantages:
         - Has only one conditional jump (comparing to two in new code.
         - Has no unconditional jump (comparing to one in new code).
         - Uses 16 bytes less stack space.
         - Machine code is shorter.

Here are the relevant different sections in sys_llseek() taken from the gcc
generated assembly files both with and without just the above patch snippet
applied:

----old code----
         movl $default_llseek, %esi
         movl 16(%ebx), %eax
         testl %eax, %eax
         je .L1427
         movl 4(%eax), %eax
         testl %eax, %eax
         cmovne %eax, %esi
.L1427:
         pushl 48(%esp)
         pushl %ecx
         pushl %edx
         pushl 12(%esp)
         call *%esi

----new code----
         movl 16(%ebx), %eax
         testl %eax, %eax
         je .L1428
         movl 4(%eax), %eax
         testl %eax, %eax
         je .L1428
         pushl 64(%esp)
         pushl %ecx
         pushl %edx
         pushl %ebx
         call *%eax
         jmp .L1442
         .p2align 4,,7
.L1428:
         pushl 64(%esp)
         pushl %ecx
         pushl %edx
         pushl 12(%esp)
         call default_llseek
.L1442:

Obviously we are counting a few cycles difference only (although the fewer
jmps could make a bigger difference if the branch prediction of the CPU
doesn't match to the common usage patch and the pipe line is stalled) and
sys_llseek() is not exactly the most performance critical code, however I
thought it is interesting to note that the old code is a "coding style"
which produces better machine code in general with (gcc-2.96)...

And also it provided me with something fun to do in the half hour I am
waiting for my gel to run. (-8 Back to lab work...

Best regards,

         Anton

-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jul 23 2002 - 22:00:22 EST