Re: zero-copy TCP

From: kumon@flab.fujitsu.co.jp
Date: Mon Sep 04 2000 - 01:56:16 EST


The experiment showed the following prefetching could reduce 20-30% of
csum_partial_copy_generic() execution time.

This suggests CPU cause heavy cache miss at checksum calclation and it
is also supported by the PMC (performance monitoring counter)
measurements. So I think the following statement is not fully
acceptable.

Linus Torvalds writes:
> Proof: the data to be sent out is in RAM. In fact, often it is cached
> in the CPU these days. In order to start sending out the packet, the
> smart card has to move all of the data from RAM/cache over the bus to
> the card. It can only start actually sending after that. Cost: bus
> speed to copy it over.

I posted the patch few times but no responce received. I think now is
the good time to repost the code. prefetching is implemented by
simple load instructions, so no dedicated instruction is needed and it
is applicable Pentium-PRO or later.

I think the code can save some part of overhead in the current
copy-full implementation.

diff -rc linux-2.3.99-pre8/arch/i386/lib/checksum.S linux-2.3.99-pre8-Pf/arch/i386/lib/checksum.S
*** linux-2.3.99-pre8/arch/i386/lib/checksum.S Thu Mar 23 07:23:54 2000
--- linux-2.3.99-pre8-Pf/arch/i386/lib/checksum.S Mon May 15 22:45:41 2000
***************
*** 394,419 ****
          movl ARGBASE+8(%esp),%edi #dst
          movl ARGBASE+12(%esp),%ecx #len
          movl ARGBASE+16(%esp),%eax #sum
! movl %ecx, %edx
          movl %ecx, %ebx
          shrl $6, %ecx
          andl $0x3c, %ebx
          negl %ebx
          subl %ebx, %esi
          subl %ebx, %edi
          lea 3f(%ebx,%ebx), %ebx
          testl %esi, %esi
          jmp *%ebx
  1: addl $64,%esi
          addl $64,%edi
          ROUND1(-64) ROUND(-60) ROUND(-56) ROUND(-52)
          ROUND (-48) ROUND(-44) ROUND(-40) ROUND(-36)
          ROUND (-32) ROUND(-28) ROUND(-24) ROUND(-20)
          ROUND (-16) ROUND(-12) ROUND(-8) ROUND(-4)
  3: adcl $0,%eax
          dec %ecx
          jge 1b
! 4: andl $3, %edx
          jz 7f
          cmpl $2, %edx
          jb 5f
--- 394,425 ----
          movl ARGBASE+8(%esp),%edi #dst
          movl ARGBASE+12(%esp),%ecx #len
          movl ARGBASE+16(%esp),%eax #sum
! # movl %ecx, %edx
          movl %ecx, %ebx
+ movl %esi, %edx
          shrl $6, %ecx
          andl $0x3c, %ebx
          negl %ebx
          subl %ebx, %esi
          subl %ebx, %edi
+ lea -1(%esi),%edx
+ andl $-32,%edx
          lea 3f(%ebx,%ebx), %ebx
          testl %esi, %esi
          jmp *%ebx
  1: addl $64,%esi
          addl $64,%edi
+ SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl)
          ROUND1(-64) ROUND(-60) ROUND(-56) ROUND(-52)
          ROUND (-48) ROUND(-44) ROUND(-40) ROUND(-36)
          ROUND (-32) ROUND(-28) ROUND(-24) ROUND(-20)
          ROUND (-16) ROUND(-12) ROUND(-8) ROUND(-4)
  3: adcl $0,%eax
+ addl $64, %edx
          dec %ecx
          jge 1b
! 4: movl ARGBASE+12(%esp),%edx #len
! andl $3, %edx
          jz 7f
          cmpl $2, %edx
          jb 5f

--
Computer Systems Laboratory, Fujitsu Labs.
kumon@flab.fujitsu.co.jp
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Sep 07 2000 - 21:00:17 EST