Re: nfs udp 1000/100baseT issue

From: Neil Brown
Date: Thu Mar 16 2006 - 21:20:28 EST


On Thursday March 16, magnade@xxxxxxxxx wrote:
> On 3/16/06, Jan Engelhardt <jengelh@xxxxxxxxxxxxxxx> wrote:
> > >
> > >a while ago i noticed a issue when one has a nfs server that has
> > >gigabit connection
> > >to a network and a client that connects to that network instead via 100baseT
> > >that udp connection from client to server fails the client gets a
> > >server not responding
> > >message when trying to access a file, interesting bit is you can get a directory
> > >listing without issue
> > >work around i found for this is adding proto=tcp to the client side
> > >and all works
> > >without error
> >
> > UDP has its implications, like silently dropping packets when the link
> > is full, by design. Try tcpdump on both systems and compare what packets
> > are sent and which do arrive. The error message is then probably because
> > the client is confused of not receiving some packets.
>
> after compairing a working and not working client i found that
> packets containing offset 19240, 20720, 22200 are missing
> and the 100baseT client had an extra offset of 32560
> on the working client it ends at 31080
>
> the missing ones are mostly constantly missing 22200 appears every so often
> on retransmission and 23680 also disappears every so often
>
> i hope that isnt too confusing i dont use tcpdump type stuff much
> (well i did give up on tcpdump and had to use ethereal...)

This is all to be expected. I remember having this issue with a
server on 100M and clients in 10M...

There is no flow control in UDP. If anything gets lots, the client
has to resend the request, and the server then has to respond again.
If the respond is large (e.g. a read) and gets fragmented (if > 1500bytes)
then there is a good chance that one or more fragments of a reply will
get lots in the switch stepping down from 1G to 100M. Every time.

Your options include:

- use tcp
- get a switch with a (much) bigger packet buffer
- drop the server down to 100M
- drop the nfs rsize down to 1024 to you don't get fragments.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/