I had sent in a note on nfs performance issues some time back,
and Mark Hemment had been kind enough to point out to the
zerocopy networking patch. Well, we tried with it, and it does
seem to have some improvement, but it seems to have screwed up
nfs performance a bit, because we see a LOT of rpc failures for
all kinds of calls, starting from lookup, to read and writes.
Could this possibly be triggered by this patch ( picked up from
davem's site for 2.4.0 ).
On the other hand, we do plan to migrate to 2.4.2. Can somebody
update me or provide pointers to info. as to whether we can
expect some of these problems have been resolved in 2.4.2? We
should soon be testing on 2.4.2
Get your own "800" number
Voicemail, fax, email, and a lot more
---- On Wed, 4 Apr 2001, Mark Hemment (firstname.lastname@example.org)
> I believe David Miller's latest zero-copy patches might help
> In his patch, the pull-up buffer is now allocated near the
> (in the sunrpc code), so it can be a blocking allocation.
> This doesn't fix the core VM problems, but does relieve the
> _slightly_ on the VM (I assume, haven't tried David's patch
> One of the core problems is that the VM keeps no measure of
> page fragmentation in the free page pool. The system reaches
> having plenty of free single pages (so kswapd and friends
> - or if they are, they do no or little word), and very few
> (which you need for some of the NFS requests).
> Unfortunately, even with keeping a mesaure of fragmentation,
> insuring work is done when it is reached, doesn't solve the
> When a large order request comes in, the inactive_clean page
> reaped. As reclaim_page() simply selects the "oldest" page it
> no regard as to whether it will buddy (now, or 'possibily in
> future), this list is quickly shrunk by a large order request
- far too
> quickly for a well behaved system.
> An NFS write request, with an 8K block size, needs an
> up buffer (we shouldn't really be pulling the header into the
> as the data - perhaps we aren't any more?). On a well used
> order-2 _blocking_ allocation ends up populating the order-0
> with quite a few pages from the inactive_clean.
> This then triggers another problem. :(
> As large (non-zero) order requests are always from the
NORMAL or DMA
> zones, these zones tend to have a lot of free-pages (put there
> blind reclaim_page() - well, once you can do a blocking
> are, or when the fragmentation kicking is working).
> New allocations for pages for the page-cache often ignore
> zone (it reaches a steady state), and so is passed over by the
> head of __alloc_pages()).
> However, NORMAL and DMA zones tend to be above pages_low
(due to the
> reason above), and so new page-cache pages came from these
zones. On a
> HIGHMEM system this leads to thrashing of the NORMAL zone,
> HIGHMEM zone stays (relatively) quiet.
> Note: To make matters even worse under this condition,
> of the NORMAL zone is exactly what you don't want to happen!
> much better if they could be left alone for a (short) while to
> chance to buddy - Linux (at present) doesn't care about the
> pages in the HIGHMEM zone (no non-zero allocations come from
> I was working on these problems (and some others) a few
> will to return to them shortly. Unfortunately, the changes
> look too large for 2.4....
> Also, for NFS, the best solution now might be to give the
> receive buffer. With David's patches, the pull-up occurs in
> of a thread, making this possible.
> This doesn't solve the problem for other subsystems which do
> order page allocations, but (perhaps) they have a low enough
> to be of real issue.
> Note: Ensure you put a "sync" in your /etc/exports - the
> behaviour was "async" (not legal for a valid SpecFS run).
> On Wed, 4 Apr 2001, Alan Cox wrote:
> > > We have been seeing some problems with running nfs
> > > at very high loads and were wondering if somebody could
> > > some pointers to where the problem lies.
> > > The system is a 2.4.0 kernel on a 6.2 Red at distribution
> > Use 2.2.19. The 2.4 VM is currently too broken to survive
> > tests without going silly
> > -
> > To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> > the body of a message to email@example.com
> > More majordomo info at
> > Please read the FAQ at http://www.tux.org/lkml/
> To unsubscribe from this list: send the line "unsubscribe
> the body of a message to firstname.lastname@example.org
> More majordomo info at
> Please read the FAQ at http://www.tux.org/lkml/
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to email@example.com
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Mon Apr 30 2001 - 21:00:14 EST