> I've added a small change to your patch. It simply allows growing
> buffer with GFP_BUFFER but only if absolute necessary. I've found this
> solution experimentally by playing with some conditions and positions
> calling wakeup_bdflush(1) in refill_freelist. This is done by running
> two `bonnie -s 200' and two `make clean; make -j zImage' in loops.
>
> The patch is against pre-patch-2.0.31-7 plus Gadi's deadlock patch ...
> it's one of the fastest kernel I've ever seen. ... But there are
> also testers needed for this patch _before_ the next pre-patch release
> or the real 2.0.31 ... anybody out there?
>
>
> Werner
I would avoid calling GFP_ATOMIC in grow_buffers() altogether, and reserve
those last few pages only for irq handler clients, for buffer heads labels
for swapping requests in 2.0.x, etc.
The buffer cache can potentially claim most of the system memory; declaring
those last few pages "completely off limit" for it would not affect it very
much, but would help a lot to the other sub-systems which desperately depend
on them.
Using GFP_ATOMIC, and avoiding calling wakeup_bdflush(1) on each cycle
can result in zero free pages in the following two conditions:
-- The 2.0.x kernels contain the following in find_candidate():
if (buffer_locked(bh) && bh->b_list == BUF_LOCKED ...) {
/* Buffers are written in the order they are placed
on the locked list. If we encounter a locked buffer
here, this means that the rest of them are also locked */
(*list_len) = 0;
return NULL;
}
This is incorrect. In the 2.0.x kernels, we can enter a
situation in which most of the BUF_LOCKED list is reclaimable,
but we will not be able to reclaim it since we have a single
locked buffer in front of the list.
The "nr_buffers_type[BUF_DIRTY] > 60%" will not be triggered
in this case, as most of the buffers will be in the LOCKED
list, and we will enter the GFP_ATOMIC calls.
This was fixed in the 2.1.x kernels by traversing the entire list.
I think that this fix potentially has a lot of CPU overhead,
since many times the above "if we encounter a locked buffer,
all of them are locked" argument *is* correct, and in those
cases we will traverse more than 10000 elements on each cycle
in search for a single buffer, only to reach failure.
-- If the buffer cache is currently very small and the page
cache is very big (for example, after tar cvf /dev/null
/usr/), we can again enter the GFP_ATOMIC allocations.
In that case, the effect of wakeup_bdflush(1) was not to really
flush dirty buffers. Rather, the effect was to simply sleep for
a while until kswapd() will move pages from the page cache into
the free pages pool.
Gadi