Re: ext3 allocate-with-reservation latencies

From: Mingming Cao
Date: Thu Apr 07 2005 - 14:20:08 EST


On Thu, 2005-04-07 at 14:08 +0100, Stephen C. Tweedie wrote:
> Hi,
>
> On Thu, 2005-04-07 at 09:14, Ingo Molnar wrote:
>
> > doesnt the first option also allow searches to be in parallel?
>
> In terms of CPU usage, yes. But either we use large windows, in which
> case we *can't* search remotely near areas of the disk in parallel; or
> we use small windows, in which case failure to find a free bit becomes
> disproportionately expensive (we have to do an rbtree link and unlink
> for each window we search.)
>

I was thinking that at the end of find_next_reservable_window(), since
we already know the previous node in the rbtree to insert, we could just
link the window there, that way the rbtree link cost will be minimized.

Then I realize in rbtree, previous node is not necessary the parent
node, we still have to search part of the rbtree to find the parent
node, which is really expensive in the case the insertion operations are
very frequent. Hmmm...

> > This particular one took over 1 msec,

If you look at the pattern in the latency report:
.........
cvs-14981 0.n.2 1044us : find_next_zero_bit (bitmap_search_next_usable_block)
cvs-14981 0.n.2 1045us : ext3_test_allocatable (bitmap_search_next_usable_block)
cvs-14981 0.n.2 1045us : find_next_zero_bit (bitmap_search_next_usable_block)
cvs-14981 0.n.2 1046us : find_next_zero_bit (bitmap_search_next_usable_block)
cvs-14981 0.n.2 1047us : ext3_test_allocatable (bitmap_search_next_usable_block)
cvs-14981 0.n.2 1048us : find_next_zero_bit (bitmap_search_next_usable_block)
cvs-14981 0.n.2 1049us : find_next_zero_bit (bitmap_search_next_usable_block)
cvs-14981 0.n.2 1049us : ext3_test_allocatable (bitmap_search_next_usable_block)
........

the patten is: two calls to find_next_zero_bit() followed by a
ext3_test_allocatabl(). And we have about 440 iteration in this report.

That means we are really unlucky: Checked the on disk bitmap, found a
free bit, but it's not free in the journal copy, then we check the
journal copy, find a free bit, but it's not free on disk....I could
imagine we need to check the journal copy once or twice(to prevent re-
use the just freed bit before it is been commited), but how could this
many (440) iterations happen?


Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/