Re: 2.2.18Pre Lan Performance Rocks!

From: Jeff V. Merkey (jmerkey@vger.timpanogas.org)
Date: Mon Oct 30 2000 - 02:16:46 EST


On Mon, Oct 30, 2000 at 08:08:58AM +0100, Andi Kleen wrote:
> On Sun, Oct 29, 2000 at 11:58:21PM -0700, Jeff V. Merkey wrote:
> > On Mon, Oct 30, 2000 at 07:47:00AM +0100, Andi Kleen wrote:
> > > On Sun, Oct 29, 2000 at 06:35:31PM -0700, Jeff V. Merkey wrote:
> > > > On Mon, Oct 30, 2000 at 12:04:23AM +0000, Alan Cox wrote:
> > > > > > It's still got some problems with NFS (I am seeing a few RPC timeout
> > > > > > errors) so I am backreving to 2.2.17 for the Ute-NWFS release next week,
> > > > > > but it's most impressive.
> > > > >
> > > > > Can you send a summary of the NFS reports to nfs-devel@linux.kernel.org
> > > >
> > > > Yes. I just went home, so I am emailing from my house. I'll post late
> > > > tonight or in the morning. Performance on 100Mbit with NFS going
> > > > Linux->Linux is getting better throughput than IPX NetWare Client ->
> > > > NetWare 5.x on the same network by @ 3%. When you start loading up a
> > > > Linux server, it drops off sharply and NetWare keeps scaling, however,
> > > > this does indicate that the LAN code paths are equivalent relative to
> > > > latency vs. MSM/TSM/HSM in NetWare. NetWare does better caching
> > > > (but we'll fix this in Linux next). I think the ring transitions to
> > > > user space daemons are what are causing the scaling problems
> > > > Linux vs. NetWare.
> > >
> > > There are no user space daemons involved in the knfsd fast path, only in slow paths
> > > like mounting.
> >
> > So why is it spawning off nfsd servers in user space? I did notice that all
>
> They just provide a process context, the actual work is only done in kernel mode.
>
> > the connect logic is down in the kernel with the RPC stuff in xprt.c, and this
> > looks to be pretty fast. I know about the checksumming problem with TCPIP.
> > But cycles are cheap on todays processors, so even with this overhead, it
> > could still get faster. IPX uses small packets that are less wire efficient
> > since the ratio of header size to payload is larger than what NFS in Linux
> > is doing, plus there's more of them for an equivalent data transfer, even
> > with packet burst. IPX would tend to be faster if there were multiple
> > routers involved, since the latency of smaller packets would be less and
> > IPX never has to deal with the problem of fragmentation.
> >
> > Is there any CR3 reloading or tasking switching going on for each interrupt
> > that comes in you know of, BTW? EMON stats show the server's bus utilization
>
> No, interrupts do not change CR3.
>
> One problem in Linux 2.2 is that kernel threads reload their VM on context switch
> (that would include the nfsd thread), this should be fixed in 2.4 with lazy mm.

When you say it reloads it's VM, you mean it reloads the CR3 register?
This will cost you about 15 clocks up front, and about 150 clocks over
time as each TLB is reloaded (plus is will do LOCK# assertions invisibly
underneath for PTE fetches) One major difference is that in NetWare,
CR3 does not ever get changed in any network I/O fast paths, and none of
the TCP or IPX processes exist in user space, so there's never this
background overhead with page tables being loaded in and out. CR3 only
gets mucked with when someone allocates memory and pages in an address
frame (when memory is alloced and freed).

The user space activity is what's causing this, plus the copying. Is there
an easy way to completely disable multiple PTE/PDE address spaces and just
map the entire address space linear (like NetWare) with a compile option
so I could do a true apples to apples comparison?

Jeff

> Hmm actually it should be only fixed for true kernel threads that have been started
> with kernel_thread(), the "pseudo kernel threads" like nfsd uses probably do not
> get that optimization because they don't set their MM to init_mm.
>
> > to get disproportiantely higher in Linux than NetWare 5.x and when it hits
> > 60% of total clock cycles, Linux starts dropping off. NetWare 5.x is 1/8
>
> I think that can be explained by the copying.

It's more. I am seeing tons of segment register reloads, which are very
heavy, and atomic reads of PTE entries.

>
> To be sure you could e.g. use the ktrace patch from IKD, it will give you
> cycle accurate traces of execution of functions

I'll look at this.

> > the bus utilitization, which would suggest clocks are being used for something
> > other than pushing packets in and out of the box (the checksumming is done
> > in the tcp code when it copies the fragments, which is very low overhead).
>
> No, unfortunately not. NFS uses UDP and the UDP defragmenting doesn't do copy-checksum

Correct. You're right -- it's in the tcp code, however, I remember going over
this when I was reviewing Alan's code -- that's what I was think of.

Jeff

> currently.
>
>
> -Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:25 EST