Re: Linux-2.2.4 testpatch..

Chuck Lever (cel@monkey.org)
Thu, 25 Mar 1999 02:39:42 -0500 (EST)


On 24 Mar 1999, Linus Torvalds wrote:
> There's a potential problem under certain load conditions with 2.2.4,
> which should be fixed by this one-liner. If you have trouble with 2.2.4
> (apart from the obvious one where you have to add ", NULL" to two places
> in kernel/acct.c), would you try this simple thing?
>
> Note that I _really_ want to hear about if if this makes any difference
> at all. It's a real bug, but I think it needs some rather pathological
> usage patterns to actually become a real problem. So I'd like to know
> if it actually makes a difference in real life...

when i first considered this, i agreed with your reasoning, and thought
that it would be a good change. however, after trying it under load, i
discovered that leaving b_count as 1 for free buffers actually *helps*
performance, and doesn't appear to cause the memory shortage problems you
feared. i'm genuinely curious to know, btw, what pathological conditions
do you think might cause a catastrophic memory shortage?

setting b_count to 0 for free buffers allows try_to_free_buffers() to free
more pages-- this is what you wanted it to do. but it's actually bad
behavior in terms of performance, since the pages it frees are quite
arbitrary in relation to which buffers are most used. in other words,
setting b_count to 0 when freeing buffers exacerbates the fact that the
Linux buffer cache has no buffer reclamation policy right now. leaving
b_count set to 1 reduces the likelihood that a page will be stolen, thus
preserving the contents of the buffer cache.

under load, vmstat shows twice as many "bi" when b_count is set to 0 for
free buffers; application throughput drops measurably more as offered load
increases than when it is running on a kernel that leaves b_count alone.
one can also see the buffer cache size fluctuating significantly when the
file working set and the page working set don't all fit in memory. this,
i argue, is pathological to performance, since it means pages are flowing
in and out of the buffer cache quickly, and are not staying in long enough
to be of use.

in the big picture, shrinking either the page cache or the buffer cache
will result in lower hit rates and worse (i.e. closer to disk speed)
performance. you really want to be careful to throw out the oldest or
least used page/buffer, because to do otherwise wastes disk bandwidth, and
that hurts *both* file system *and* VM performance.

setting b_count to zero for free buffers might be the right thing to do,
but it makes more urgent the need to fix the reclamation problem.
successfully stealing pages from the buffer cache right now is very hard
on file system performance. perhaps adding some logic to regularly supply
the free list with the oldest buffers, and allowing try_to_free_buffers()
only to free pages containing buffers on the free list, might be a good
solution. perhaps calling try_to_free_buffers() or wakeup_bdflush()
should automatically reclaim old buffers, thereby slowing down cache
growth and reducing the need to steal pages from it. or what would happen
if the buffer cache, and not shrink_mmap(), could choose which page gets
freed/stolen?

the best solution is to figure out how to allow page stealing while not
disturbing the LRU queues in the buffer cache.

- Chuck Lever

--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@netscape.net> or <cel@monkey.org>

The Linux Scalability project: http://www.citi.umich.edu/projects/citi-netscape/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/