Re: [2.1.130-3] Page cache DEFINATELY too persistant... feature?

Stephen C. Tweedie (sct@redhat.com)
Thu, 26 Nov 1998 12:36:37 GMT


Hi,

On Wed, 25 Nov 1998 21:49:51 -0800, Benjamin Redelings I
<bredelin@ucsd.edu> said:

> Is this not absurd, or am I missing something? The swap finally went up
> to 30 Mb, before decreasing. On exiting the installer, swap went down
> to 12Mb.

> I was simply updating my debian system over the web, which consisted of
> downloading 11 Mb of compressed archives, uncompressing them, and then
> installing them.

Right. Linus, it looks like this is a result of a long-standing
property of the buffer cache, namely that we only ever throttle writes
when we run out of memory. If we have large quantities of dirty buffers
due to an install routine, then the shrink_mmap becomes less and less
able to free memory. The recent vm changes make it a lot easier to
swap things out in this situation.

The 130-pre3 changes seem to have fixed the swap aggression in the
normal case when we are loading the page cache primarily with reads, but
under high write load we still drop rapidly into swap as soon as the
buffer cache is saturated.

In the past, it has been suggested frequently that we may need to
throttle write load per device or system-wide before getting to the
point where get_free_pages(GFP_BUFFER) starts failing. Right now, all
we do is start up bdflush: we don't actually suspend the writes. As a
result, as soon as the cache fills with dirty/locked buffers we start
swapping very aggresively (and the amount of swap that Ben is seeing in
this situation actually does credit to how quickly we can swap memory
out in this case!).

There are basically two ways I think we can address this. We can either
throttle the writes, or we can extend what we have done recently in
try_to_free_pages to include dirty buffers. In try_to_free_pages(), we
currently have one memory-reclaiming function --- shrink_mmap() ---
which discards pages from the page/swap cache, and a memory-returning
function --- swap_out --- which tries to return clean, reusable pages to
shrink_mmap. It would not be hard to add a new page source of the form

case 1:
if (bdflush-active) {
down(&may_alloc_GFP_LOW);
wait_on(&bdflush_completion);
up(&may_alloc_GFP_LOW);
state = 0;
continue;
}

and do a wait on the &may_alloc_GFP_LOW semaphore if get_free_pages()
can't find memory immediately and the priority is GFP_LOW.

This would take advantage of the existing bdflush mechanisms and tuning
parameters rather than adding anything new. It would make sure that
intense buffer activity is throttled only at the point where bdflush has
already decided that there is too much write activity: moderate write
load or a small pool of write buffers would still cause us to look
elsewhere for buffers to free.

This looks like a natural way to take advantage of existing behaviour to
fix a very long standing write performance problem. Comments? Would
you prefer a more natural way of throttling writes at source?

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/