Andrew Morton wrote:
>
[...]
> ext2 and ext3 filesystems are carved up into "block groups", aka
> "cylinder groups". Each one is 4096*8 blocks - typically 128 MB.
> So you can easily have hundreds of blockgroups on a single partition.
>
> The inode allocator is designed to arrange that files which are within the
> same directory fall in the same blockgroup, for locality of reference.
>
> But new directories are placed "far away", in block groups which have
> plenty of free space. (find_group_dir -> find a blockgroup for a
> directory).
>
> The thinking here is that files in a separate directory are related,
> and files in different directories are unrelated. So we can take
> advantage of that heuristic - go and use a new blockgroup each time
> a new directory is created. This is a levelling algorithm which
> tries to keep all blockgroups at a similar occupancy level.
> That's a good thing, because high occupancy levels lead to fragmentation.
>
> find_group_other() is basically first-fit-from-start-of-disk, and
> if we use that for directories as well as files, your untar-onto-a-clean-disk
> simply lays everything out in a contiguous chunk.
>
> Part of the problem here is that it has got worse over time. The
> size of a blockgroup is hardwired to blocksize*bits-in-a-byte*blocksize.
> But disks keep on getting bigger. Five years ago (when, presumably, this
> algorithm was designed), a typical partition had, what? Maybe four
> blockgroups? Now it has hundreds, and so the "levelling" is levelling
> across hundreds of blockgroups and not just a handful.
If having only "a few" block groups really work better
(even for todays bigger disks) then bigger
block groups seems like a solution.
changing the on-disk format might not be popular, but there
is no need for that. Simply regard several on-disk block
groups as a bigger "allocation group" when using the above
algorithm. This should be perfectly backwards compatible.
Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Tue Oct 15 2002 - 22:00:25 EST