Re: Unresponiveness of 2.4.16

From: Andrew Morton (akpm@zip.com.au)
Date: Wed Nov 28 2001 - 15:31:14 EST

Next message: [MOc]cda*mirabilos: "Re: [RFC] Documentation/filesystems/tmpfs"
Previous message: Roger Larsson: "Re: Unresponiveness of 2.4.16"
In reply to: Marcelo Tosatti: "Re: Unresponiveness of 2.4.16"
Next in thread: Andreas Dilger: "Re: Unresponiveness of 2.4.16"
Reply: Andreas Dilger: "Re: Unresponiveness of 2.4.16"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Marcelo Tosatti wrote:
>
> On Tue, 27 Nov 2001, Andrew Morton wrote:
>
> > Mike Fedyk wrote:
> > >
> > > > I'll send you a patch which makes the VM less inclined to page things
> > > > out in the presence of heavy writes, and which decreases read
> > > > latencies.
> > > >
> > > Is this patch posted anywhere?
> >
> > I sent it yesterday, in this thread. Here it is again.
> >
> > Description:
> >
> > - Account for locked as well as dirty buffers when deciding
> > to throttle writers.
>
> Just one thing: If we have lots of locked buffers due to reads we are
> going to may unecessarily block writes, and thats not any good.

True. I believe this change makes balance_dirty() work as it was
originally intended to work. But in so doing, lots of things change.
Various places which have been tuned for the broken balance_dirty()
behaviour may need to be retuned. It needs testing, thought, and
a comment from Linus would be helpful.

> But well, I prefer to fix interactivity than to care about that one kind
> of workload, so I'm ok with it.
>
> > - Tweak VM to make it work the inactive list harder, before starting
> > to evict pages or swap.
>
> I would like to see he interactivity problems get fixed on block layer
> side first: Its not a VM issue initially. Actually, the thing is that if
> you tweak VM this way you're going to break some workloads.

Possibly. I have a feeling that the VM is a bit too swaphappy,
especially in the presence of heavy write() loads. I'd rather
see more aggressive dropbehind on the write() data, than see
useful cache data dropped. But I'm not sure yet.

> > - Change the elevator so that once a request's latency has
> > expired, we can still perform merges in front of that
> > request. But we no longer will insert new requests in
> > front of that request.
>
> Sounds fine... I've received quite many success reports already, right ?

A few people have reported success. Nathan Grennan didn't.

The elevator change also needs more testing and review.
There's a possibility that it could cause a seek-storm collapse
when interacting with readahead. Currently, readhead does this:

        for (some pages) {
                alloc_page()
                page_cache_read()
        }

See the potential here for the alloc_page() to get abducted
by shrink_cache(), to perform IO, and to not return until after
the previous page_cache_read() has been submitted to the device?
Ouch. Putting reads nearer the elevator head exposes this possibility.

It seems to not happen, due to the vagaries of the VM-of-the-minute,
and the workload. But it could.

So the obvious change is to allocate all the readhead pages up-front
before issuing the reads. I rewrote the readhead code to do this
(and dropped about 300 lines from filemap.c in the process), but given
that the condition doesn't trigger, it doesn't make much difference.

I've spent a week so far looking closely at various performance
and usability problems with 2.4. It's still a work-in-progress.
I don't feel ready to start offering anything for merging yet,
really. Some of these things interact, and I'd prefer to get
more off-stream testing done, as well as code review.

Current patchset is at http://www.zip.com.au/~akpm/linux/2.4/2.4.17-pre1/

The list so far is:

vm-fixes.patch
        The balance_dirty() and less swap-happy changes
write-cluster.patch
        ext2 metadata prereading and various other hacks which
        prevent writes from stumbling over reads, and thus ruining
        write clustering. This patch is in the early prototype stage
readhead.patch
        VM readhead rewrite. Designed to avoid the above
        problem, and to make readhead growth more aggressive,
        and to make readhead shrinkth less aggressive. I
        don't see why we should drop the readhead window on the
        floor if someone has read a few megs from a file and then
        seeks elsewhere within it. Also uses common code for
        mmap readhead. The madvise explicit dropbehind code
        accidentally died. Oh well.
        Testing with paging-intensive workloads (start X11, staroffice6)
        indicates that we indeed do more IO, in less requests. But
        walltime doesn't change. I may not proceed with this.
mini-ll.patch
        A kinder, gentler low-latency patch, based on the one which
        Andrea is maintaining. Doesn't drop any locks. As far as
        I'm concerned, this can be merged today (six months ago, in
        fact). It gives practically all the perceived benefit of
        the preemptive kernel patch and is clearly safe.
        A number of vendors are shipping kernels which are patched
        to add rescheduling points to copy_*_user(), which is
        much less effective than this patch. They shouldn't
        be doing this.
elevator.patch
        The previously-described elevator changes
inline.patch
        Drops a large number of ill-chosen `inline' qualifiers
        from the kernel. Removes a total of about 12,000 bytes
        of instructions, almost all from the very hottest parts of
        the kernel. Should prove useful for computers which
        have an L1 cache which is faster than main memory.
block-alloc.patch
        My nemesis. Fixing the long- and short-term fragmentation
        of ext2/ext3 blocks would be a more significant performance
        boost than anything else in the 2.4 series. But it's just
        proving intractable. I'll probably have to drop most of
        this, and look at online defrag. There's potential for
        a 3x to 5x speedup here.

Also need to do something about the stalls which Nathan Grennan
has reported. On ext3 it seems to be due to atime updates.
Not sure about ext2 yet.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: [MOc]cda*mirabilos: "Re: [RFC] Documentation/filesystems/tmpfs"
Previous message: Roger Larsson: "Re: Unresponiveness of 2.4.16"
In reply to: Marcelo Tosatti: "Re: Unresponiveness of 2.4.16"
Next in thread: Andreas Dilger: "Re: Unresponiveness of 2.4.16"
Reply: Andreas Dilger: "Re: Unresponiveness of 2.4.16"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Nov 30 2001 - 21:00:32 EST