Re: NFS read performance ugliness

Zlatko Calusic (Zlatko.Calusic@CARNet.hr)
03 Nov 1998 15:01:19 +0100


alan@lxorguk.ukuu.org.uk (Alan Cox) writes:

> > Writes are so slow that NFS is practically unusable. Too bad, cause I
> > have lots of unused disk space on Suns, around here.
> >
> > My tests are on very similar equipment, both computers on 100MBps
> > full-duplex ports on switch. DE500-BA (de4x5 driver) on Linux.
> >
> > I tried changing rsize/wsize parameters, but achieved nothing.
> >
> > All other types of network communication run at half to full network
> > speed (50 - 100MBps). Also when mounting Sun's disks to other Sun,
> > performance is quite OK.
> >
> > Is there any workaround?
>
> Use 2.0.x. The 2.1.x NFS code needs several major modifications before it will
> perform acceptably against Sun equipment - notably write gathering support.
>
> Alan
>
>

Well I was just making some dumps, from Linux AND Sun to see where's
the difference.

Before I proceed, I must mention that 2.0.x kernels are mostly even
worse than 2.1.x (I tried many times in last year or two). Biggest
write speed I got is 50KB/sec, and I remember that at misterious
occasions I could achieve cca 200KB/sec with 2.1.x (in some older
revision).

So... I made some dumps and revealed some interesting things
(unfortunately, I can't explain some of them).

Sun <-> Sun NFS communication indeed happens through NFS v3 protocol
(via TCP), but for the tests sake, I forced v2/UDP behaviour.

Here is what I got in that case, when Sun's communicate:

13:14:07.627209 sol-client.170855072 > sol-server.nfs: 1472 proc-7 (frag 39435:1480@0+)
13:14:07.628408 sol-client > sol-server: (frag 39435:1480@1480+)
13:14:07.629655 sol-client > sol-server: (frag 39435:1480@2960+)
13:14:07.630869 sol-client > sol-server: (frag 39435:1480@4440+)
13:14:07.632105 sol-client > sol-server: (frag 39435:1480@5920+)
13:14:07.633329 sol-client > sol-server: (frag 39435:1480@7400+)
13:14:07.634562 sol-client > sol-server: (frag 39435:1480@8880+)
13:14:07.635799 sol-client > sol-server: (frag 39435:1480@10360+)
13:14:07.637022 sol-client > sol-server: (frag 39435:1480@11840+)
13:14:07.638259 sol-client > sol-server: (frag 39435:1480@13320+)
13:14:07.639489 sol-client > sol-server: (frag 39435:1480@14800+)
13:14:07.640722 sol-client > sol-server: (frag 39435:1480@16280+)
13:14:07.641949 sol-client > sol-server: (frag 39435:1480@17760+)
13:14:07.643184 sol-client > sol-server: (frag 39435:1480@19240+)
13:14:07.644416 sol-client > sol-server: (frag 39435:1480@20720+)
13:14:07.645649 sol-client > sol-server: (frag 39435:1480@22200+)
13:14:07.646883 sol-client > sol-server: (frag 39435:1480@23680+)
13:14:07.648105 sol-client > sol-server: (frag 39435:1480@25160+)
13:14:07.649336 sol-client > sol-server: (frag 39435:1480@26640+)
13:14:07.650566 sol-client > sol-server: (frag 39435:1480@28120+)
13:14:07.651806 sol-client > sol-server: (frag 39435:1480@29600+)
13:14:07.653029 sol-client > sol-server: (frag 39435:1480@31080+)
13:14:07.653120 sol-client > sol-server: (frag 39435:360@32560)
13:14:07.654059 sol-server.nfs > sol-client.170855072: reply ok 160 proc-7 (DF)

So, Sun's gather data (as you noted), and thus performance is much
better (ack is delayed). Also, reading tcpdump manual, I learnt that
"+" at the end of the line means "there's more fragments to go", and
when "+" is not present, that is final fragment. Number before plus
sign is offset. So we can easily see that Sun stacks 32768 bytes worth
of data. Everything is cool.

Now look at the logs when Linux writes to Solaris' NFS server (this
time I include more writes, since they're short):

13:54:48.796049 lin-client.1464731136 > sol-server.nfs: 1472 write [|nfs] (frag 4616:1480@0+)
13:54:48.796215 lin-client > sol-server: (frag 4617:1308@2960)
13:54:48.796390 lin-client > sol-server: (frag 4617:1480@1480+)
13:54:48.796554 lin-client.1481508352 > sol-server.nfs: 1472 write [|nfs] (frag 4617:1480@0+)
13:54:48.819163 sol-server.nfs > lin-client.1464731136: reply ok 96 write [|nfs] (DF)
13:54:48.820368 lin-client > sol-server: (frag 4618:1308@2960)
13:54:48.820542 lin-client > sol-server: (frag 4618:1480@1480+)
13:54:48.820721 lin-client.1498285568 > sol-server.nfs: 1472 write [|nfs] (frag 4618:1480@0+)
13:54:48.827512 sol-server.nfs > lin-client.1481508352: reply ok 96 write [|nfs] (DF)
13:54:48.852062 sol-server.nfs > lin-client.1498285568: reply ok 96 write [|nfs] (DF)

Strange is that packet order is reversed (in time) if I read dumps
correctly. First last fragment is received, then middle one, and in
the end, fragment that contains NFS (IP,UDP?) headers. Could this be a
bug in Linux NFS implementation?

Also, I notice, no matter what rsize, wsize options I use, dump always
looks like that. Something like rsize,wsize=4096 is hardcoded and
can't be changed?

I'm confused! :)

-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
       A good way to deal with predators is to taste terrible.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/