Re: [PATCH] Clustering indirect blocks in Ext3

From: Abhishek Rai
Date: Fri Jan 11 2008 - 09:49:39 EST


That will surely help sequential read performance for large
unfragmented files and we have considered it before. There are two
main reasons why we want the data blocks and the corresponding
indirect blocks to share the same block group.

1. When a block group runs out of a certain types of blocks (data
blocks or indirect blocks), we use blocks of the other type for
allocation. Consequently, if data blocks and their corresponding
indirect blocks are sharing the same block group, we'll run out of
data blocks in the block group exactly at the same time as we run out
of indirect blocks, so we know we have well utilized the block group
and can move on to the next block group. This keeps things simple and
results in low fragmentation. However, if data blocks and their
indirect blocks were to go into two different block groups, it is
possible that you run out of one kind of blocks in one block group
while you still have the other kind available in the other block group
since these two are independent now. So now we need to decide which
kind of allocation to move over to which block group. This requires
slightly more advanced heuristics and I didn't want to add this
complexity for the small gain it offers.

2. I think sharing a block group the way it's done currently is a
cleaner design since allocation is quite self-contained within a block
group. I'd argue in the long run it's good to stick to a cleaner
design even if it is 1-2% worse in performance in some cases. Among
other things, cleaner designs are easier to change and enhance in the
future. More importantly, in this case our goal is to speed up fsck
without slowing down IO and we are comfortably achieving that goal.

Thanks,
Abhishek

On Jan 11, 2008 9:12 AM, Bodo Eggert <7eggert@xxxxxx> wrote:
> Abhishek Rai <abhishekrai@xxxxxxxxxx> wrote:
>
> > Putting metacluster at the end of the block group gives slightly
> > inferior sequential read throughput compared to putting it in the
> > beginning or the middle, but the difference is very tiny and exists
> > only for large files that span multiple block groups.
>
> Just an idea:
>
> What about putting it into the end of the previous block group (except for
> the first group, off cause) and starting to read the block group a little
> earlier (readahead/~before)? I imagine it might be about as good as placing
> it at the beginning while avoiding the fragmentation.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/