Basically, the server-side write gathering code is, er, sub-optimal.
I've played with the delay parameters a lot, but I never arrived at
a satisfactory set of values. 60KBps is still a bit low. I've been
seeing something around 200KBps, but that may have been specific to
my combination of client/NICs etc.
What the code is currently(?) doing is checking whether the write goes
to the same write as the last one. If so, we assume that there are more
to come, and delay syncing the data for a bit. If the inode is still
dirty after that, we call write_inode_now(). This _usually_ works,
but it's in no way foolproof.
The problem is that we can't know in advance whether the next NFS request
we receive is another write call for the same file. The packet is not
being looked at until the current nfsd thread sleeps, which it usually
won't unless it has to wait for disk I/O. So the first opportunity the next
packet will be inspected is when we're already syncing the file/inode.
So nfsd either will have to peek at the packet as it's delivered by
the data_ready network callback (udp only), or do some delaying when
it looks as if it might be possible to cluster the next few calls.
A somewhat improved design might be
if (we should do write gathering) {
static struct wait_queue wait = ...;
if (less than N nfsd's delaying execution)
interruptible_sleep_on_timeout(&inode->i_wait, ...);
if (inode->i_state & I_DIRTY) {
sync file/inode
}
wake_up(&wait);
}
I guess the VFS buffs will come up with something better...
And of course you may have to enable the wgather exports option, even
though I think its on by default.
Olaf
-- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir@monad.swb.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/