Re: [2.1.130-3] Page cache DEFINATELY too persistant... feature?

Stephen C. Tweedie (sct@redhat.com)
Thu, 26 Nov 1998 20:09:47 GMT


Hi,

On Thu, 26 Nov 1998 09:49:47 -0800 (PST), Linus Torvalds
<torvalds@transmeta.com> said:

> Hmm.. I know we _used_ to do a sync() in do_free_pages() when we
> started to need to page things out exactly to avoid this issue.
> Considering how much has changed I wouldn't be surprised if that has
> been disabled or deleted by mistake,

It looks like it: even 2.0 doesn't have anything that looks like a sync
in vmsan.c.

> because it tends to happen only under specific circumstances that are
> _not_ the normal things you tend to test when you test low-memory
> situations.

Unless you are testing mke2fs on a Red Hat install for 8MB machines
before you've got swap set up. You run out of space _real_ fast that
way. :)

> We should really be a lot more aggressive about getting rid of dirty
> buffer cache entries. They are almost never worth having around.

Agreed, but that's not the main problem here: the problem is not how
quickly we get rid of ex-dirty buffers once they are recyclable, it is
how to limit the number of dirty buffers in the first place. We can
only write them back to disk at the drive's rate, but mke2fs (for
example) can generate new dirty buffers almost arbitrarily quickly.
That's where the throttling issue comes in.

> I think throwttling them at the source is just wrong, and think your
> suggestion makes sense. I don't know whether it makes much sense to do
> this with bdflush, though - I'd be more inclined to just do it directly.

Unfortunately, if we have parallel syncs or bdflushes active, we end up
seriously thrashing the disks. I've got reports of slowdowns of between
500 and 1000% when we get concurrent sync()s active. Keeping it in
bdflush would help to avoid that.

bdflush is also a natural place to trickle back batches of writes:
unlike sync, bdflush has an upper limit on how much it will write back
before looping through the wait. If we just do a sync(), then once
memory is full of dirty buffers, chances are we'll have the sync and
bdflush competing for the disk and thrashing like crazy. Using bdflush,
we can still have processes filling memory, but once buffer space is
fully expanded it will loop round writing some buffers and then freeing
them. It's important not to wait for _all_ buffers to be written before
we start reclaiming their space!

> Now, the same should be true of "bdflush" - we could on bdflush to
> generally keep the number of dirty pages down, and to balance the peak
> usage so that we don't get bouts of extremely heavy disk activity (like we
> did with the old "update" process). But when we really need to write stuff
> out, the page-out process should really try to free stuff up itself.

> Would you agree with that analogy?

To some extent, but having "the page-out process" (kswapd) completely
stall all activity until we have done a sync seems unwise if we have
network activity in progress. The other advantage of using bdflush is
that it already has tuning parameters which will allow us to decide just
how full things get before we start the flush-out, and we also have
existing links in the mark_buffer_dirty code to force a bdflush wakeup.

The main thing I want to avoid is having to stall for the whole
writeback to complete before being able to restart the shrink_mmap()
loop, and taking advantage of the batched writes of bdflush seems a
natural way to do that.

Anyway, I'll experiment...

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/